The paradigm for creating multi-lingual text-to-speech voice databases

被引:0
|
作者
Chu, Min [1 ]
Zhao, Yong [1 ]
Chen, Yining [1 ]
Wang, Lijuan [1 ]
Soong, Frank [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
multi-lingual; text-to-speech; voice database;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice database is one of the most important parts in TTS systems. However, creating a high quality new TTS voice is not an easy task even for a professional team, The whole process is rather complicated and contains plenty minutiae that should be handled carefully. In fact, in many stages, human interference such as manually checking or labeling is necessary. In multi-lingual situations, it is more challenge to find qualified people to do this kind of interference. That's why most state-of-the-art TTS systems can provide only a few voices. In this paper, we outline a uniform paradigm for creating multi-lingual TTS voice databases. It focuses on technologies that can either improve the scalability of data collection or reduce human interference such as manually checking or labeling. With this paradigm, we decrease the complexity and work load of the task.
引用
收藏
页码:736 / +
页数:3
相关论文
共 50 条
  • [1] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
    Singh, Abhayjeet
    Nagireddi, Amala
    Jayakumar, Anjali
    Deekshitha, G.
    Bandekar, Jesuraja
    Roopa, R.
    Badiger, Sandhya
    Udupa, Sathvik
    Kumar, Saurabh
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Zen, Heiga
    Kumar, Pranaw
    Kant, Kamal
    Bole, Amol
    Singh, Bira Chandra
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
  • [2] Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment
    Liu, Zhaoyu
    Mak, Brian
    INTERSPEECH 2020, 2020, : 2932 - 2936
  • [3] Development of multi-lingual speech recognition and text-to-speech synthesis for automotive applications
    Deguchi, Y
    Kagoshima, T
    Hirabayashi, G
    Kanazawa, H
    TELEMATCS FOR VEHICLES, 2002, 1728 : 233 - 240
  • [4] Development of multi-lingual speech recognition and text-to-speech synthesis for automotive applications
    Deguchi, Y.
    Kagoshima, T.
    Hirabayashi, G.
    Kanazawa, H.
    VDI Berichte, 2002, (1728): : 233 - 240
  • [5] LIGHT-TTS: LIGHTWEIGHT MULTI-SPEAKER MULTI-LINGUAL TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8383 - 8387
  • [6] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Minchan
    Lee, Hyeonseung
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59
  • [7] Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech
    Jeong, Myeonghun
    Kim, Minchan
    Choi, Byoung Jin
    Yoon, Jaesam
    Jang, Won
    Kim, Nam Soo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1519 - 1530
  • [8] Development of multi-lingual speech recognition and text-to speech synthesis for automotive applications
    Deguchi, Y.
    Kagoshima, T.
    Hirabayashi, G.
    Kanazawa, H.
    Hogenhout, M.
    VDI Berichte, 2003, (1789): : 3081 - 3088
  • [9] Development of multi-lingual speech recognition and text-to speech synthesis for automotive applications
    Deguchi, Y
    Kagoshima, T
    Hirabayashi, G
    Kanazawa, H
    Hogenhout, M
    ELECTRONIC SYSTEMS FOR VEHICLES, 2003, 1789 : 1167 - 1174
  • [10] Multi-lingual interoperability in speech technology
    Steeneken, HJM
    SPEECH COMMUNICATION, 2001, 35 (1-2) : 1 - 3