Effective Data Augmentation Methods for Neural Text-to-Speech Systems

被引:0
|
作者
Oh, Suhyeon [1 ]
Kwon, Ohsung [1 ]
Hwang, Min-Jae [1 ]
Kim, Jae-Min [1 ]
Song, Eunwoo [1 ]
机构
[1] NAVER Corp, Seongnam, South Korea
关键词
speech synthesis; self-augmentation; ranking support vector machine;
D O I
10.1109/ICEIC54506.2022.9748515
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes an effective self-augmentation method for improving the quality of neural text-to-speech (TTS) systems. As synthetic speech quality has been greatly improved, creating a neural TTS system using synthetic corpora is now possible. However, whether increasing the amount of synthetic data is always beneficial for improving training efficiency has not been verified. Our aim in this study is to selectively choose synthetic data whose characteristics are close to those of natural speech. Specifically, we adopt a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the synthetic and recorded corpora as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar with the recorded data. As training data can be selectively chosen from large-scale synthetic corpora, the performance of the TTS model re-trained by those data is significantly improved. Subjective evaluation results verify that the proposed TTS model performs much better than the original model trained with recorded data alone and the similarly configured system re-trained with all the synthetic data without any selection method.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] ON-THE-FLY DATA AUGMENTATION FOR TEXT-TO-SPEECH STYLE TRANSFER
    Chung, Raymond
    Mak, Brian
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 634 - 641
  • [2] Few-shot dysarthric speech recognition with text-to-speech data augmentation
    Hermann, Enno
    Magimai-Doss, Mathew
    INTERSPEECH 2023, 2023, : 156 - 160
  • [3] Neural networks in text-to-speech systems for the Greek language
    Falas, T
    Stafylopatis, AG
    MELECON 2000: INFORMATION TECHNOLOGY AND ELECTROTECHNOLOGY FOR THE MEDITERRANEAN COUNTRIES, VOLS 1-3, PROCEEDINGS, 2000, : 574 - 577
  • [4] Objective evaluation methods for Chinese Text-To-Speech systems
    Zhang, Teng
    Chen, Zhipeng
    Wu, Ji
    Lail, Sam
    Lei, Wenhui
    Isert, Carsten
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 332 - 336
  • [5] LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION
    Huybrechts, Goeric
    Merritt, Thomas
    Comini, Giulia
    Perz, Bartek
    Shah, Raahil
    Lorenzo-Trueba, Jaime
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6593 - 6597
  • [6] Predication of prosodic data in Persian text-to-speech systems using recurrent neural network
    Farrokhi, A
    Ghaemmaghami, S
    ELECTRONICS LETTERS, 2003, 39 (25) : 1868 - 1869
  • [7] CROSS-SPEAKER STYLE TRANSFER FOR TEXT-TO-SPEECH USING DATA AUGMENTATION
    Ribeiro, Manuel Sam
    Roth, Julian
    Comini, Giulia
    Huybrechts, Goeric
    Gabrys, Adam
    Lorenzo-Trueba, Jaime
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6797 - 6801
  • [8] A text analyzer for Korean text-to-speech systems
    Lee, SH
    Oh, YH
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695
  • [9] The use of lexica in text-to-speech systems
    Quazza, S
    Van den Heuvel, H
    LEXICON DEVELOPMENT FOR SPEECH AND LANGUAGE PROCESSING, 2000, 12 : 207 - 233
  • [10] E-TTS: Expressive Text-to-Speech Synthesis for Hindi Using Data Augmentation
    Gupta, Ishika
    Murthy, Hema A.
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 243 - 257