EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion

被引：1

作者：

Miao, Chenfeng ^{[1
]}

Zhu, Qingying ^{[1
]}

Chen, Minchuan ^{[1
]}

Ma, Jun ^{[1
]}

Wang, Shaojun ^{[1
]}

Xiao, Jing ^{[1
]}

机构：

[1] Ping Technol, Shanghai 200120, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

关键词：

Training; Vectors; Computational modeling; Task analysis; Acoustics; Couplings; Computer architecture; Text-to-speech; speech synthesis; voice conversion; differentiable aligner; VAE; hierarchical-VAE; end-to-end;

D O I：

10.1109/TASLP.2024.3369528

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, the field of Text-to-Speech (TTS) has been dominated by one-stage text-to-waveform models which have significantly improved speech quality compared to two-stage models. In this work, we propose EfficientTTS 2 (EFTS2), a one-stage high-quality end-to-end TTS framework that is fully differentiable and highly efficient. Our method adopts an adversarial training process, with a differentiable aligner and a hierarchical-VAE-based waveform generator. These design choices free the model from the use of external aligners, invertible structures, and complex training procedures as most previous TTS works have. Moreover, we extend EFTS2 to the voice conversion (VC) task and propose EFTS2-VC, an end-to-end VC model that allows high-quality speech-to-speech conversion. Experimental results suggest that the two proposed models achieve better or at least comparable speech quality compared to baseline models, while also providing faster inference speeds and smaller model sizes.

引用

页码：1650 / 1661

页数：12

共 50 条

[31] Reinforce-Aligner: Reinforcement Alignment Search for Robust End-to-End Text-to-Speech
Chung, Hyunseung
Lee, Sang-Hoon
Lee, Seong-Whan
INTERSPEECH 2021, 2021, : 3635 - 3639
[32] END-TO-END TEXT-TO-SPEECH USING LATENT DURATION BASED ON VQ-VAE
Yasuda, Yusuke
Wang, Xin
Yamagishi, Junichi
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5694 - 5698
[33] Phonetic and Prosodic Information Estimation from Texts for Genuine Japanese End-to-End Text-to-Speech
Kakegawa, Naoto
Hara, Sunao
Abe, Masanobu
Ijima, Yusuke
INTERSPEECH 2021, 2021, : 126 - 130
[34] End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Mitsui, Kentaro
Zhao, Tianyu
Sawada, Kei
Hono, Yukiya
Nankaku, Yoshihiko
Tokuda, Keiichi
INTERSPEECH 2022, 2022, : 2328 - 2332
[35] Boosting subjective quality of Arabic text-to-speech (TTS) using end-to-end deep architecture
Fahmy, Fady K.
Abbas, Hazem M.
Khalil, Mahmoud, I
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 79 - 88
[36] You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
Laptev, Aleksandr
Korostik, Roman
Svischev, Aleksey
Andrusenko, Andrei
Medennikov, Ivan
Rybin, Sergey
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 439 - 444
[37] Boosting subjective quality of Arabic text-to-speech (TTS) using end-to-end deep architecture
Fady K. Fahmy
Hazem M. Abbas
Mahmoud I. Khalil
International Journal of Speech Technology, 2022, 25 : 79 - 88
[38] End-to-End Voice Conversion with Information Perturbation
Xie, Qicong
Yang, Shan
Lei, Yi
Xie, Lei
Su, Dan
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 91 - 95
[39] VISINGER: VARIATIONAL INFERENCE WITH ADVERSARIAL LEARNING FOR END-TO-END SINGING VOICE SYNTHESIS
Zhang, Yongmao
Cong, Jian
Xue, Heyang
Xie, Lei
Zhu, Pengcheng
Bi, Mengxiao
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7237 - 7241
[40] Development of robotic voice conversion for RIBO using text-to-speech synthesis
Hossain, Md. Jakir
Al Amin, Sayed Mahmud
Islam, Md. Saiful
Marium-E-Jannat
2018 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT), 2018, : 422 - 425

← 1 2 3 4 5 →