The paradigm for creating multi-lingual text-to-speech voice databases

被引:0
|
作者
Chu, Min [1 ]
Zhao, Yong [1 ]
Chen, Yining [1 ]
Wang, Lijuan [1 ]
Soong, Frank [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
multi-lingual; text-to-speech; voice database;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice database is one of the most important parts in TTS systems. However, creating a high quality new TTS voice is not an easy task even for a professional team, The whole process is rather complicated and contains plenty minutiae that should be handled carefully. In fact, in many stages, human interference such as manually checking or labeling is necessary. In multi-lingual situations, it is more challenge to find qualified people to do this kind of interference. That's why most state-of-the-art TTS systems can provide only a few voices. In this paper, we outline a uniform paradigm for creating multi-lingual TTS voice databases. It focuses on technologies that can either improve the scalability of data collection or reduce human interference such as manually checking or labeling. With this paradigm, we decrease the complexity and work load of the task.
引用
收藏
页码:736 / +
页数:3
相关论文
共 50 条
  • [41] CROSS-LINGUAL TEXT-TO-SPEECH VIA HIERARCHICAL STYLE TRANSFER
    Lee, Sang-Hoon
    Choi, Ha-Yeong
    Lee, Seong-Whan
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 25 - 26
  • [42] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
    Chen, Mengnan
    Chen, Minchuan
    Liang, Shuang
    Ma, Jun
    Chen, Lei
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2019, 2019, : 2105 - 2109
  • [43] Text-to-Speech Software and Learning: Investigating the Relevancy of the Voice Effect
    Craig, Scotty D.
    Schroeder, Noah L.
    JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 2019, 57 (06) : 1534 - 1548
  • [44] Deep Voice: Real-time Neural Text-to-Speech
    Arik, Sercan O.
    Chrzanowski, Mike
    Coates, Adam
    Diamos, Gregory
    Gibiansky, Andrew
    Kang, Yongguo
    Li, Xian
    Miller, John
    Ng, Andrew
    Raiman, Jonathan
    Sengupta, Shubho
    Shoeybi, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [45] Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression
    He, Wenhao
    Zhang, Xu-Yao
    Yin, Fei
    Liu, Cheng-Lin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5406 - 5419
  • [46] EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH
    Kim, Tae-Ho
    Cho, Sungjae
    Choi, Shinkook
    Park, Sejik
    Lee, Soo-Young
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7774 - 7778
  • [47] Web Voice Browser Based on an ISLPC Text-to-Speech Algorithm
    LIAO Rikun
    WuhanUniversityJournalofNaturalSciences, 2006, (05) : 1157 - 1160
  • [48] Depression-level assessment from multi-lingual conversational speech data using acoustic and text features
    Demiroglu, Cenk
    Besirli, Asli
    Ozkanca, Yasin
    Celik, Selime
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
  • [49] Speech emotion recognition based on multi-feature and multi-lingual fusion
    Wang, Chunyi
    Ren, Ying
    Zhang, Na
    Cui, Fuwei
    Luo, Shiying
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
  • [50] X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
    Guo, Houjian
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    INTERSPEECH 2024, 2024, : 4983 - 4987