The paradigm for creating multi-lingual text-to-speech voice databases

被引:0
|
作者
Chu, Min [1 ]
Zhao, Yong [1 ]
Chen, Yining [1 ]
Wang, Lijuan [1 ]
Soong, Frank [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
multi-lingual; text-to-speech; voice database;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice database is one of the most important parts in TTS systems. However, creating a high quality new TTS voice is not an easy task even for a professional team, The whole process is rather complicated and contains plenty minutiae that should be handled carefully. In fact, in many stages, human interference such as manually checking or labeling is necessary. In multi-lingual situations, it is more challenge to find qualified people to do this kind of interference. That's why most state-of-the-art TTS systems can provide only a few voices. In this paper, we outline a uniform paradigm for creating multi-lingual TTS voice databases. It focuses on technologies that can either improve the scalability of data collection or reduce human interference such as manually checking or labeling. With this paradigm, we decrease the complexity and work load of the task.
引用
收藏
页码:736 / +
页数:3
相关论文
共 50 条
  • [31] Voice Builder: A Tool for Building Text-To-Speech Voices
    De Silva, Pasindu
    Wattanavekin, Theeraphol
    Hao, Tang
    Pipatsrisawat, Knot
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2241 - 2245
  • [32] Multi-Lingual Depression-Level Assessment from Conversational Speech Using Acoustic and Text Features
    Ozkanca, Yasin
    Demiroglu, Cenk
    Besirli, Ash
    Celik, Selime
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3398 - 3402
  • [33] Arbitrarily-oriented multi-lingual text detection in video
    Khare, Vijeta
    Shivakumara, Palaiahnakote
    Paramesran, Raveendran
    Blumenstein, Michael
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (15) : 16625 - 16655
  • [34] CATALIST: CAmera TrAnsformations for Multi-LIngual Scene Text Recognition
    Sood, Shivam
    Saluja, Rohit
    Ramakrishnan, Ganesh
    Chaudhuri, Parag
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 213 - 228
  • [35] Statistical evaluation of similarity measures on multi-lingual text corpora
    Neumann, R
    Schmidt, R
    TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 369 - 371
  • [36] Arbitrarily-oriented multi-lingual text detection in video
    Vijeta Khare
    Palaiahnakote Shivakumara
    Raveendran Paramesran
    Michael Blumenstein
    Multimedia Tools and Applications, 2017, 76 : 16625 - 16655
  • [37] Baseline detection of multi-lingual unconstrained handwritten text lines
    Chakraborty, Dibyayan
    Pal, Umapada
    PATTERN RECOGNITION LETTERS, 2016, 74 : 74 - 81
  • [38] Multi-lingual Transformer Training for Khmer Automatic Speech Recognition
    Soky, Kak
    Li, Sheng
    Kawahara, Tatsuya
    Seng, Sopheap
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1893 - 1896
  • [39] A comprehensive review on detection of hate speech for multi-lingual data
    Narula, Rachna
    Chaudhary, Poonam
    SOCIAL NETWORK ANALYSIS AND MINING, 2025, 14 (01)
  • [40] Automatic learning of numeral grammars for multi-lingual speech synthesizers
    Flach, G
    Holzapfel, M
    Just, C
    Wachtler, A
    Wolff, M
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1291 - 1294