Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques

被引:32
|
作者
Turk, Oytun [1 ]
Schroeder, Marc [2 ]
机构
[1] Sensory Inc, Portland, OR 97209 USA
[2] DFKI GmbH Language Technol Lab, Speech Grp, D-66123 Saarbrucken, Germany
关键词
Expressive speech synthesis; prosody; voice conversion; voice quality transformation;
D O I
10.1109/TASL.2010.2041113
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Generating expressive synthetic voices requires carefully designed databases that contain sufficient amount of expressive speech material. This paper investigates voice conversion and modification techniques to reduce database collection and processing efforts while maintaining acceptable quality and naturalness. In a factorial design, we study the relative contributions of voice quality and prosody as well as the amount of distortions introduced by the respective signal manipulation steps. The unit selection engine in our open source and modular text-to-speech (TTS) framework MARY is extended with voice quality transformation using either GMM-based prediction or vocal tract copy resynthesis. These algorithms are then cross-combined with various prosody copy resynthesis methods. The overall expressive speech generation process functions as a postprocessing step on TTS outputs to transform neutral synthetic speech into aggressive, cheerful, or depressed speech. Cross-combinations of voice quality and prosody transformation algorithms are compared in listening tests for perceived expressive style and quality. The results show that there is a tradeoff between identification and naturalness. Combined modeling of both voice quality and prosody leads to the best identification scores at the expense of lowest naturalness ratings. The fine detail of both voice quality and prosody, as preserved by the copy synthesis, did contribute to a better identification as compared to the approximate models.
引用
收藏
页码:965 / 973
页数:9
相关论文
共 50 条
  • [41] Expressive synthesis:: How crucial is voice quality?
    Gobl, C
    Bennett, E
    Ní Chasaide, A
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 91 - 94
  • [42] Development of robotic voice conversion for RIBO using text-to-speech synthesis
    Hossain, Md. Jakir
    Al Amin, Sayed Mahmud
    Islam, Md. Saiful
    Marium-E-Jannat
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT), 2018, : 422 - 425
  • [43] MODEL-MAPPING BASED VOICE CONVERSION SYSTEM A Novel Approach to Improve Voice Similarity and Naturalness using Model-based Speech Synthesis Techniques
    Li, Baojie
    Wu, Dalei
    Jiang, Hui
    [J]. BIOSIGNALS 2010: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2010, : 442 - 446
  • [44] Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion
    Nose, Takashi
    Kobayashi, Takao
    [J]. 2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 578 - 581
  • [45] Robust processing techniques for voice conversion
    Turk, Oytun
    Arslan, Levent M.
    [J]. Computer Speech and Language, 2006, 20 (04): : 441 - 467
  • [46] EVALUATION OF SPEECH WITH A VOICE PROSTHESIS
    OMORI, K
    SHOJI, K
    FUKUSHIMA, H
    KOJIMA, H
    [J]. FOLIA PHONIATRICA, 1989, 41 (4-5): : 201 - 202
  • [47] An automatic close copy speech synthesis tool for large-scale speech corpus evaluation
    Gibbon, Dafydd
    Bachan, Jolanta
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 902 - 907
  • [48] Iteratively Improving Speech Recognition and Voice Conversion
    Singh, Mayank Kumar
    Takahashi, Naoya
    Onoe, Naoyuki
    [J]. INTERSPEECH 2023, 2023, : 206 - 210
  • [49] Sentiment Analysis for Expressive Text to Speech Synthesis System Using Different Techniques for Tamil Language
    Sangeetha, J.
    Sudhakar, B.
    Venkatesan, R.
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2019, 12 (02): : 1 - 7
  • [50] Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis
    Liu, Liangqi
    Hu, Jiankun
    Wu, Zhiyong
    Yang, Song
    Yang, Songfan
    Jia, Jia
    Meng, Helen
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 410 - 414