The paradigm for creating multi-lingual text-to-speech voice databases

被引：0

作者：

Chu, Min ^{[1
]}

Zhao, Yong ^{[1
]}

Chen, Yining ^{[1
]}

Wang, Lijuan ^{[1
]}

Soong, Frank ^{[1
]}

机构：

[1] Microsoft Res Asia, Beijing, Peoples R China

来源：

CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS | 2006年 / 4274卷

关键词：

multi-lingual; text-to-speech; voice database;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Voice database is one of the most important parts in TTS systems. However, creating a high quality new TTS voice is not an easy task even for a professional team, The whole process is rather complicated and contains plenty minutiae that should be handled carefully. In fact, in many stages, human interference such as manually checking or labeling is necessary. In multi-lingual situations, it is more challenge to find qualified people to do this kind of interference. That's why most state-of-the-art TTS systems can provide only a few voices. In this paper, we outline a uniform paradigm for creating multi-lingual TTS voice databases. It focuses on technologies that can either improve the scalability of data collection or reduce human interference such as manually checking or labeling. With this paradigm, we decrease the complexity and work load of the task.

引用

页码：736 / +

页数：3

共 50 条

[41] CROSS-LINGUAL TEXT-TO-SPEECH VIA HIERARCHICAL STYLE TRANSFER
Lee, Sang-Hoon
Choi, Ha-Yeong
Lee, Seong-Whan
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 25 - 26
[42] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Chen, Mengnan
Chen, Minchuan
Liang, Shuang
Ma, Jun
Chen, Lei
Wang, Shaojun
Xiao, Jing
INTERSPEECH 2019, 2019, : 2105 - 2109
[43] Text-to-Speech Software and Learning: Investigating the Relevancy of the Voice Effect
Craig, Scotty D.
Schroeder, Noah L.
JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 2019, 57 (06) : 1534 - 1548
[44] Deep Voice: Real-time Neural Text-to-Speech
Arik, Sercan O.
Chrzanowski, Mike
Coates, Adam
Diamos, Gregory
Gibiansky, Andrew
Kang, Yongguo
Li, Xian
Miller, John
Ng, Andrew
Raiman, Jonathan
Sengupta, Shubho
Shoeybi, Mohammad
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[45] Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression
He, Wenhao
Zhang, Xu-Yao
Yin, Fei
Liu, Cheng-Lin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5406 - 5419
[46] EMOTIONAL VOICE CONVERSION USING MULTITASK LEARNING WITH TEXT-TO-SPEECH
Kim, Tae-Ho
Cho, Sungjae
Choi, Shinkook
Park, Sejik
Lee, Soo-Young
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7774 - 7778
[47] Web Voice Browser Based on an ISLPC Text-to-Speech Algorithm
LIAO Rikun
WuhanUniversityJournalofNaturalSciences, 2006, (05) : 1157 - 1160
[48] Depression-level assessment from multi-lingual conversational speech data using acoustic and text features
Demiroglu, Cenk
Besirli, Asli
Ozkanca, Yasin
Celik, Selime
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2020, 2020 (01)
[49] Speech emotion recognition based on multi-feature and multi-lingual fusion
Wang, Chunyi
Ren, Ying
Zhang, Na
Cui, Fuwei
Luo, Shiying
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4897 - 4907
[50] X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
Guo, Houjian
Liu, Chaoran
Ishi, Carlos Toshinori
Ishiguro, Hiroshi
INTERSPEECH 2024, 2024, : 4983 - 4987

← 1 2 3 4 5 →