Emotion transplantation through adaptation in HMM-based speech synthesis

被引：24

作者：

Lorenzo-Trueba, Jaime ^{[1
]}

Barra-Chicote, Roberto ^{[1
]}

San-Segundo, Ruben ^{[1
]}

Ferreiros, Javier ^{[1
]}

Yamagishi, Junichi ^{[2
]}

Montero, Juan M. ^{[1
]}

机构：

[1] Univ Politecn Madrid, ETSI Telecomunicac, Speech Technol Grp, E-28040 Madrid, Spain

[2] Univ Edinburgh, CSTR, Informat Forum, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

COMPUTER SPEECH AND LANGUAGE | 2015年 / 34卷 / 01期

关键词：

Statistical parametric speech synthesis; Expressive speech synthesis; Cascade adaptation; Emotion transplantation;

D O I：

10.1016/j.csl.2015.03.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes an emotion transplantation method capable of modifying a synthetic speech model through the use of CSMAPLR adaptation in order to incorporate emotional information learned from a different speaker model while maintaining the identity of the original speaker as much as possible. The proposed method relies on learning both emotional and speaker identity information by means of their adaptation function from an average voice model, and combining them into a single cascade transform capable of imbuing the desired emotion into the target speaker. This method is then applied to the task of transplanting four emotions (anger, happiness, sadness and surprise) into 3 male speakers and 3 female speakers and evaluated in a number of perceptual tests. The results of the evaluations show how the perceived naturalness for emotional text significantly favors the use of the proposed transplanted emotional speech synthesis when compared to traditional neutral speech synthesis, evidenced by a big increase in the perceived emotional strength of the synthesized utterances at a slight cost in speech quality. A final evaluation with a robotic laboratory assistant application shows how by using emotional speech we can significantly increase the students' satisfaction with the dialog system, proving how the proposed emotion transplantation system provides benefits in real applications. (C) 2015 Elsevier Ltd. All rights reserved.

引用

页码：292 / 307

页数：16

共 50 条

[21] UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
Oura, Keiichiro
Tokuda, Keiichi
Yamagishi, Junichi
King, Simon
Wester, Mirjam
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4594 - 4597
[22] Speaker Adaptation using Nonlinear Regression Techniques for HMM-based Speech Synthesis
Hong, Doo Hwa
Kang, Shin Jae
Lee, Joun Yeop
Kim, Nam Soo
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 586 - 589
[23] Noise in HMM-Based Speech Synthesis Adaptation: Analysis, Evaluation Methods and Experiments
Karhila, Reima
Remes, Ulpu
Kurimo, Mikko
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 285 - 295
[24] HMM-Based Speech Synthesis for the Greek Language
Karabetsos, Sotiris
Tsiakoulis, Pirros
Chalamandaris, Aimilios
Raptis, Spyros
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 349 - 356
[25] A BAYESIAN APPROACH TO HMM-BASED SPEECH SYNTHESIS
Hashimoto, Kei
Zen, Heiga
Nankaku, Yoshihiko
Masuko, Takashi
Tokuda, Keiichi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4029 - +
[26] An improved minimum generation error based model adaptation for HMM-based speech synthesis
Wu, Yi-Jian
Qin, Long
Tokuda, Keiichi
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1727 - +
[27] An HMM-based Vietnamese Speech Synthesis System
Vu, Thang Tat
Luong, Mai Chi
Nakamura, Satoshi
ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
[28] An HMM-based Cantonese Speech Synthesis System
Wang, Xin
Wu, Zhiyong
2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
[29] Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm
Yamagishi, Junichi
Kobayashi, Takao
Nakano, Yuji
Ogata, Katsumi
Isogai, Juri
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (01): : 66 - 83
[30] Thousands of Voices for HMM-based Speech Synthesis
Yamagishi, Junichi
Usabaev, Bela
King, Simon
Watts, Oliver
Dines, John
Tian, Jilei
Hu, Rile
Guan, Yong
Oura, Keiichiro
Tokuda, Keiichi
Karhila, Reima
Kurimo, Mikko
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 416 - +

← 1 2 3 4 5 →