Effective Data Augmentation Methods for Neural Text-to-Speech Systems

被引:0
|
作者
Oh, Suhyeon [1 ]
Kwon, Ohsung [1 ]
Hwang, Min-Jae [1 ]
Kim, Jae-Min [1 ]
Song, Eunwoo [1 ]
机构
[1] NAVER Corp, Seongnam, South Korea
关键词
speech synthesis; self-augmentation; ranking support vector machine;
D O I
10.1109/ICEIC54506.2022.9748515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes an effective self-augmentation method for improving the quality of neural text-to-speech (TTS) systems. As synthetic speech quality has been greatly improved, creating a neural TTS system using synthetic corpora is now possible. However, whether increasing the amount of synthetic data is always beneficial for improving training efficiency has not been verified. Our aim in this study is to selectively choose synthetic data whose characteristics are close to those of natural speech. Specifically, we adopt a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the synthetic and recorded corpora as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar with the recorded data. As training data can be selectively chosen from large-scale synthetic corpora, the performance of the TTS model re-trained by those data is significantly improved. Subjective evaluation results verify that the proposed TTS model performs much better than the original model trained with recorded data alone and the similarly configured system re-trained with all the synthetic data without any selection method.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] The Art of Text-to-Speech
    Lindquist, Benjamin
    CRITICAL INQUIRY, 2024, 50 (02) : 225 - 251
  • [32] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [33] Text-to-speech for customers
    不详
    EXPERT SYSTEMS, 1998, 15 (01) : 66 - 66
  • [34] Text processing techniques for text-to-speech conversion systems to enhance the quality of synthesized speech
    ATR Interpreting Telecommunications, Research Lab
    NTT R&D, 10 (1011-1018):
  • [35] CLASSIFICATION OF METHODS USED FOR ASSESSMENT OF TEXT-TO-SPEECH SYSTEMS ACCORDING TO THE DEMANDS PLACED ON THE LISTENER
    GOLDSTEIN, M
    SPEECH COMMUNICATION, 1995, 16 (03) : 225 - 244
  • [36] Characterization of Human Emotions and Preferences for Text-to-Speech Systems Using Multimodal Neuroimaging Methods
    Laghari, Khalil Ur Rehman
    Gupta, Rishabh
    Arndt, Sebastian
    Antons, Jan-Niklas
    Moeller, Sebastian
    Falk, Tiago H.
    2014 IEEE 27TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2014,
  • [37] AN EVALUATION OF MONGOLIAN DATA-DRIVEN TEXT-TO-SPEECH
    Altangerel, Chagnaa
    Purev, Jaimai
    Yesyenbyek, Kerey
    Hansakunbuntheung, Chatchawarn
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [38] Building Text-to-Speech Systems for Resource Poor Languages
    Samsudin, Nur-Hana
    Lee, Mark
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3327 - 3334
  • [39] Experiments with training corpora for statistical text-to-speech systems
    Podsiadlo, Monika
    Ungureanu, Victor
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2002 - 2006
  • [40] Romanian language statistics and resources for text-to-speech systems
    Stan, Adriana
    Giurgiu, Mircea
    2010 9TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2010, : 381 - 384