Emotion transplantation through adaptation in HMM-based speech synthesis

被引：24

作者：

Lorenzo-Trueba, Jaime ^{[1
]}

Barra-Chicote, Roberto ^{[1
]}

San-Segundo, Ruben ^{[1
]}

Ferreiros, Javier ^{[1
]}

Yamagishi, Junichi ^{[2
]}

Montero, Juan M. ^{[1
]}

机构：

[1] Univ Politecn Madrid, ETSI Telecomunicac, Speech Technol Grp, E-28040 Madrid, Spain

[2] Univ Edinburgh, CSTR, Informat Forum, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

COMPUTER SPEECH AND LANGUAGE | 2015年 / 34卷 / 01期

关键词：

Statistical parametric speech synthesis; Expressive speech synthesis; Cascade adaptation; Emotion transplantation;

D O I：

10.1016/j.csl.2015.03.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students' satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：292 / 307

页数：16

共 50 条

[1] Unsupervised adaptation for HMM-based speech synthesis
King, Simon
Tokuda, Keiichi
Zen, Heiga
Yamagishi, Junichi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
[2] FACTORED MLLR ADAPTATION FOR HMM-BASED EXPRESSIVE SPEECH SYNTHESIS
Sung, June Sig
Hong, Doo Hwa
Lee, Chul Min
Kim, Nam Soo
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 974 - 977
[3] An acoustic model adaptation using hmm-based speech synthesis
Tanaka, K
Kuroiwa, S
Tsuge, S
Ren, F
2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 368 - 373
[4] Data Selection and Adaptation for Naturalness in HMM-based Speech Synthesis
Cooper, Erica
Chang, Alison
Levitan, Yocheved
Hirschberg, Julia
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 357 - +
[5] Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
Tamura, M., 1600, John Wiley and Sons Inc. (35):
[6] Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
Gao, Weixun
Cao, Qiying
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1149 - 1166
[7] Czech HMM-Based Speech Synthesis: Experiments with Model Adaptation
Hanzlicek, Zdenek
TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 107 - 114
[8] HMM-based emotional speech synthesis using average emotion model
Qin, Long
Ling, Zhen-Hua
Wu, Yi-Jian
Zhang, Bu-Fan
Wang, Ren-Hua
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 233 - +
[9] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
Wu, Yi-Jian
King, Simon
Tokuda, Keiichi
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
[10] Nearest Neighbor Approach in Speaker Adaptation for HMM-based Speech Synthesis
Mohammadi, Amir
Demiroglu, Cenk
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,

← 1 2 3 4 5 →