Emotion transplantation through adaptation in HMM-based speech synthesis

被引:24
|
作者
Lorenzo-Trueba, Jaime [1 ]
Barra-Chicote, Roberto [1 ]
San-Segundo, Ruben [1 ]
Ferreiros, Javier [1 ]
Yamagishi, Junichi [2 ]
Montero, Juan M. [1 ]
机构
[1] Univ Politecn Madrid, ETSI Telecomunicac, Speech Technol Grp, E-28040 Madrid, Spain
[2] Univ Edinburgh, CSTR, Informat Forum, Edinburgh EH8 9AB, Midlothian, Scotland
来源
COMPUTER SPEECH AND LANGUAGE | 2015年 / 34卷 / 01期
关键词
Statistical parametric speech synthesis; Expressive speech synthesis; Cascade adaptation; Emotion transplantation;
D O I
10.1016/j.csl.2015.03.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students' satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:292 / 307
页数:16
相关论文
共 50 条
  • [41] Speaker interpolation for HMM-based speech synthesis system
    Yoshimura, Takayoshi, 2000, Acoustical Soc Jpn, Tokyo, Japan (21):
  • [42] Contextual Additive Structure for HMM-Based Speech Synthesis
    Takaki, Shinji
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 229 - 238
  • [43] Parameterization of Vocal Fry in HMM-Based Speech Synthesis
    Silen, Hanna
    Helander, Elina
    Nurminen, Jani
    Gabbouj, Moncef
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1735 - +
  • [44] REACTIVE AND CONTINUOUS CONTROL OF HMM-BASED SPEECH SYNTHESIS
    Astrinaki, Maria
    d'Alessandro, Nicolas
    Picart, Benjamin
    Drugman, Thomas
    Dutoit, Thierry
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 252 - 257
  • [45] An HMM-based speech synthesis system applied to English
    Tokuda, K
    Zen, H
    Black, AW
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 227 - 230
  • [46] The Design and Implementation of HMM-based Dai Speech Synthesis
    Wang, Zhan
    Yang, Jian
    Yang, Xin
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [47] DIALOGUE CONTEXT SENSITIVE HMM-BASED SPEECH SYNTHESIS
    Tsiakoulis, Pirros
    Breslin, Catherine
    Gasic, Milica
    Henderson, Matthew
    Kim, Dongho
    Szummer, Martin
    Thomson, Blaise
    Young, Steve
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [48] Evaluation of the Slovenian HMM-based speech synthesis system
    Vesnicer, B
    Mihelic, F
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 513 - 520
  • [49] HMM-based Tibetan Lhasa Speech Synthesis System
    Wu Zhiqiang
    Yu Hongzhi
    Li Guanyu
    Wan Shuhui
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 92 - 95
  • [50] Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic
    Houidhek, Amal
    Colotte, Vincent
    Mnasri, Zied
    Jouvet, Denis
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (04) : 895 - 906