LIGHT-TTS: LIGHTWEIGHT MULTI-SPEAKER MULTI-LINGUAL TEXT-TO-SPEECH

被引：5

作者：

Li, Song ^{[1
]}

Ouyang, Beibei ^{[1
]}

Li, Lin ^{[1
]}

Hong, Qingyang ^{[2
]}

机构：

[1] Xiamen Univ, Sch Elect Sci & Engn, Xiamen, Peoples R China

[2] Xiamen Univ, Sch Informat, Xiamen, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

Multi-speaker; multi-lingual; speech synthesis; non-autoregressive; lightweight;

D O I：

10.1109/ICASSP39728.2021.9414400

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

With the development of deep learning, end-to-end neural text-to-speech (TTS) systems have achieved significant improvements in high-quality speech synthesis. However, most of these systems are attention-based autoregressive models, resulting in slow synthesis speed and large model parameters. In addition, speech in different languages is usually synthesized using different models, which increases the complexity of the speech synthesis systems. In this paper, we propose a new lightweight multi-speaker multi-lingual speech synthesis system, named LightTTS, which can quickly synthesize the Chinese, English or code-switch speech of multiple speakers in a non-autoregressive generation manner using only one model. Moreover, compared to FastSpeech with the same number of neural network layers and nodes, our LightTTS achieves a 2.50x Mel-spectrum generation acceleration on CPU, and the parameters are compressed by 12.83x.

引用

页码：8383 / 8387

页数：5

共 50 条

[1] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
Singh, Abhayjeet
Nagireddi, Amala
Jayakumar, Anjali
Deekshitha, G.
Bandekar, Jesuraja
Roopa, R.
Badiger, Sandhya
Udupa, Sathvik
Kumar, Saurabh
Ghosh, Prasanta Kumar
Murthy, Hema A.
Zen, Heiga
Kumar, Pranaw
Kant, Kamal
Bole, Amol
Singh, Bira Chandra
Tokuda, Keiichi
Hasegawa-Johnson, Mark
Olbrich, Philipp
[J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
[2] Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment
Liu, Zhaoyu
Mak, Brian
[J]. INTERSPEECH 2020, 2020, : 2932 - 2936
[3] A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization
Cheon, Sung Jun
Choi, Byoung Jin
Kim, Minchan
Lee, Hyeonseung
Kim, Nam Soo
[J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 55 - 59
[4] Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech
Jeong, Myeonghun
Kim, Minchan
Choi, Byoung Jin
Yoon, Jaesam
Jang, Won
Kim, Nam Soo
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1519 - 1530
[5] LIGHTSPEECH: LIGHTWEIGHT NON-AUTOREGRESSIVE MULTI-SPEAKER TEXT-TO-SPEECH
Li, Song
Ouyang, Beibei
Li, Lin
Hong, Qingyang
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 499 - 506
[6] Multi-speaker Emotional Text-to-speech Synthesizer
Cho, Sungjae
Lee, Soo-Young
[J]. INTERSPEECH 2021, 2021, : 2337 - 2338
[7] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Chen, Mengnan
Chen, Minchuan
Liang, Shuang
Ma, Jun
Chen, Lei
Wang, Shaojun
Xiao, Jing
[J]. INTERSPEECH 2019, 2019, : 2105 - 2109
[8] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
Xue, Jinlong
Deng, Yayue
Han, Yichen
Li, Ya
Sun, Jianqing
Liang, Jiaen
[J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
[9] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Arik, Sercan O.
Diamos, Gregory
Gibiansky, Andrew
Miller, John
Peng, Kainan
Ping, Wei
Raiman, Jonathan
Zhou, Yanqi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[10] The paradigm for creating multi-lingual text-to-speech voice databases
Chu, Min
Zhao, Yong
Chen, Yining
Wang, Lijuan
Soong, Frank
[J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 736 - +

← 1 2 3 4 5 →