A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis

被引：0

作者：

Tuerk, Oytun ^{[1
]}

Schroeder, Marc ^{[1
]}

机构：

[1] DFKI GmbH, Language Technol Lab, Saarbrucken, Germany

来源：

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年

关键词：

voice quality transformation; voice conversion; emotional speech synthesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus being on spectral envelope conversion for transforming the overall voice quality. Various improvements over the state-of-the-art weighted codebook mapping and GMM based voice conversion frameworks are employed resulting in three algorithms. Objective evaluation results show that all three methods result in comparable reduction in objective distance to target expressive ITS outputs whereas weighted frame mapping and GMM based transformations were perceived slightly better than the weighted codebook mapping outputs in generating the target expressive style in a listening test.

引用

页码：2282 / 2285

页数：4

共 50 条

[1] The emotional quality of speech in voice services
Maffiolo, V
Chateau, N
ERGONOMICS, 2003, 46 (13-14) : 1375 - 1385
[2] Emotional speech synthesis based on improved codebook mapping voice conversion
Wang, YP
Ling, ZH
Wang, RH
AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 374 - 381
[3] Voice quality conversion in TD-PSOLA speech synthesis
Sun, XJ
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 953 - 956
[4] HMM adaptation and voice conversion for the synthesis of child speech: a comparison
Watts, Oliver
Yamagishi, Junichi
King, Simon
Berkling, Kay
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2595 - +
[5] Voice Conversion for Whispered Speech Synthesis
Cotescu, Marius
Drugman, Thomas
Huybrechts, Goeric
Lorenzo-Trueba, Jaime
Moinet, Alexis
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
[6] IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD
Jiao, Yishan
Xie, Xiang
Na, Xingyu
Tu, Ming
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] Voice Quality of European Portuguese Emotional Speech
Nunes, Ana
Coimbra, Rosa Lidia
Teixeira, Antonio
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2010, 6001 : 142 - 151
[8] An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation
He, Xiangheng
Chen, Junjie
Rizos, Georgios
Schuller, Bjorn W.
INTERSPEECH 2021, 2021, : 821 - 825
[9] MULTI VOICE TEXT TO SPEECH SYNTHESIS BASED ON THE INSTANTANEOUS PARAMETRIC VOICE CONVERSION
Azarov, Elias
Petrovsky, Alexander
Zubrycki, Piotr
SPA 2010: SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS CONFERENCE PROCEEDINGS, 2010, : 78 - 82
[10] Runtime and Speech Quality Survey of a Voice Conversion Method
Jokisch, Oliver
Birhanu, Yitagessu
Hoffmann, Ruediger
2013 IEEE EUROCON, 2013, : 1684 - 1688

← 1 2 3 4 5 →