Emotion transplantation through adaptation in HMM-based speech synthesis

被引：24

作者：

Lorenzo-Trueba, Jaime ^{[1
]}

Barra-Chicote, Roberto ^{[1
]}

San-Segundo, Ruben ^{[1
]}

Ferreiros, Javier ^{[1
]}

Yamagishi, Junichi ^{[2
]}

Montero, Juan M. ^{[1
]}

机构：

[1] Univ Politecn Madrid, ETSI Telecomunicac, Speech Technol Grp, E-28040 Madrid, Spain

[2] Univ Edinburgh, CSTR, Informat Forum, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

COMPUTER SPEECH AND LANGUAGE | 2015年 / 34卷 / 01期

关键词：

Statistical parametric speech synthesis; Expressive speech synthesis; Cascade adaptation; Emotion transplantation;

D O I：

10.1016/j.csl.2015.03.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students' satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：292 / 307

页数：16

共 50 条

[31] Analysis of HMM-Based Lombard Speech Synthesis
Raitio, Tuomo
Suni, Antti
Vainio, Martti
Alku, Paavo
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
[32] Speech parameter generation algorithms for HMM-based speech synthesis
Tokuda, K
Yoshimura, T
Masuko, T
Kobayashi, T
Kitamura, T
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
[33] HMM-BASED SPEECH SYNTHESIS ADAPTATION USING NOISY DATA: ANALYSIS AND EVALUATION METHODS
Karhila, Reima
Remes, Ulpu
Kurimo, Mikko
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6930 - 6934
[34] Model Adaptation for HMM-Based Speech Synthesis under Minimum Generation Error Criterion
Qin, Long
Wu, Yi-Jian
Ling, Zhen-Hua
Wang, Ren-Hua
ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 539 - +
[35] Minimum generation error linear regression based model adaptation for HMM-based speech synthesis
Qin, Long
Wu, Yi-Jian
Ling, Zhen-Hua
Wang, Ren-Hua
Da, Li-Rong
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3953 - +
[36] State duration modeling for HMM-based speech synthesis
Zen, Heiga
Masuko, Takashi
Tokuda, Keiichi
Yoshimura, Takayoshi
Kobayasih, Takao
Kitamura, Tadashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
[37] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
Picart, Benjamin
Drugman, Thomas
Dutoit, Thierry
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02): : 687 - 707
[38] Optimal Number of States in HMM-Based Speech Synthesis
Hanzlicek, Zdenek
TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 353 - 361
[39] Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis
Andersson, Sebastian
Yamagishi, Junichi
Clark, Robert A. J.
SPEECH COMMUNICATION, 2012, 54 (02) : 175 - 188
[40] A trainable excitation model for HMM-based speech synthesis
Maia, R.
Toda, T.
Zen, H.
Nankaku, Y.
Tokuda, K.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +

← 1 2 3 4 5 →