A Comparison of Voice Conversion Methods for Transforming Voice Quality in Emotional Speech Synthesis

被引:0
|
作者
Tuerk, Oytun [1 ]
Schroeder, Marc [1 ]
机构
[1] DFKI GmbH, Language Technol Lab, Saarbrucken, Germany
关键词
voice quality transformation; voice conversion; emotional speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus being on spectral envelope conversion for transforming the overall voice quality. Various improvements over the state-of-the-art weighted codebook mapping and GMM based voice conversion frameworks are employed resulting in three algorithms. Objective evaluation results show that all three methods result in comparable reduction in objective distance to target expressive ITS outputs whereas weighted frame mapping and GMM based transformations were perceived slightly better than the weighted codebook mapping outputs in generating the target expressive style in a listening test.
引用
收藏
页码:2282 / 2285
页数:4
相关论文
共 50 条
  • [1] The emotional quality of speech in voice services
    Maffiolo, V
    Chateau, N
    ERGONOMICS, 2003, 46 (13-14) : 1375 - 1385
  • [2] Emotional speech synthesis based on improved codebook mapping voice conversion
    Wang, YP
    Ling, ZH
    Wang, RH
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 374 - 381
  • [3] Voice quality conversion in TD-PSOLA speech synthesis
    Sun, XJ
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 953 - 956
  • [4] HMM adaptation and voice conversion for the synthesis of child speech: a comparison
    Watts, Oliver
    Yamagishi, Junichi
    King, Simon
    Berkling, Kay
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2595 - +
  • [5] Voice Conversion for Whispered Speech Synthesis
    Cotescu, Marius
    Drugman, Thomas
    Huybrechts, Goeric
    Lorenzo-Trueba, Jaime
    Moinet, Alexis
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 186 - 190
  • [6] IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD
    Jiao, Yishan
    Xie, Xiang
    Na, Xingyu
    Tu, Ming
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Voice Quality of European Portuguese Emotional Speech
    Nunes, Ana
    Coimbra, Rosa Lidia
    Teixeira, Antonio
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROCEEDINGS, 2010, 6001 : 142 - 151
  • [8] An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation
    He, Xiangheng
    Chen, Junjie
    Rizos, Georgios
    Schuller, Bjorn W.
    INTERSPEECH 2021, 2021, : 821 - 825
  • [9] MULTI VOICE TEXT TO SPEECH SYNTHESIS BASED ON THE INSTANTANEOUS PARAMETRIC VOICE CONVERSION
    Azarov, Elias
    Petrovsky, Alexander
    Zubrycki, Piotr
    SPA 2010: SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS CONFERENCE PROCEEDINGS, 2010, : 78 - 82
  • [10] Runtime and Speech Quality Survey of a Voice Conversion Method
    Jokisch, Oliver
    Birhanu, Yitagessu
    Hoffmann, Ruediger
    2013 IEEE EUROCON, 2013, : 1684 - 1688