EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion

被引:1
|
作者
Miao, Chenfeng [1 ]
Zhu, Qingying [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping Technol, Shanghai 200120, Peoples R China
关键词
Training; Vectors; Computational modeling; Task analysis; Acoustics; Couplings; Computer architecture; Text-to-speech; speech synthesis; voice conversion; differentiable aligner; VAE; hierarchical-VAE; end-to-end;
D O I
10.1109/TASLP.2024.3369528
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, the field of Text-to-Speech (TTS) has been dominated by one-stage text-to-waveform models which have significantly improved speech quality compared to two-stage models. In this work, we propose EfficientTTS 2 (EFTS2), a one-stage high-quality end-to-end TTS framework that is fully differentiable and highly efficient. Our method adopts an adversarial training process, with a differentiable aligner and a hierarchical-VAE-based waveform generator. These design choices free the model from the use of external aligners, invertible structures, and complex training procedures as most previous TTS works have. Moreover, we extend EFTS2 to the voice conversion (VC) task and propose EFTS2-VC, an end-to-end VC model that allows high-quality speech-to-speech conversion. Experimental results suggest that the two proposed models achieve better or at least comparable speech quality compared to baseline models, while also providing faster inference speeds and smaller model sizes.
引用
收藏
页码:1650 / 1661
页数:12
相关论文
共 50 条
  • [1] Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
    Kim, Jaehyeon
    Kong, Jungil
    Son, Juhee
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] Improvement of the end-to-end scene text recognition method for "text-to-speech" conversion
    Makhmudov, Fazliddin
    Mukhiddinov, Mukhriddin
    Abdusalomov, Akmalbek
    Avazov, Kuldoshbay
    Khamdamov, Utkir
    Cho, Young Im
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [3] EXPLORING END-TO-END NEURAL TEXT-TO-SPEECH SYNTHESIS FOR ROMANIAN
    Dumitrache, Marius
    Rebedea, Traian
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR NATURAL LANGUAGE PROCESSING, 2020, : 93 - 102
  • [4] Myanmar Text-to-Speech Synthesis Using End-to-End Model
    Qin, Qinglai
    Yang, Jian
    Li, Peiying
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 6 - 11
  • [5] End-to-End Mongolian Text-to-Speech System
    Li, Jingdong
    Zhang, Hui
    Liu, Rui
    Zhang, Xueliang
    Bao, Feilong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 483 - 487
  • [6] Improving transfer of expressivity for end-to-end multispeaker text-to-speech synthesis
    Kulkarni, Ajinkya
    Colotte, Vincent
    Jouvet, Denis
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 31 - 35
  • [7] End-to-End Thai Text-to-Speech with Linguistic Unit
    Wisetpaitoon, Kontawat
    Singkul, Sattaya
    Sakdejayont, Theerat
    Chalothorn, Tawunrat
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 951 - 959
  • [8] NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality
    Tan, Xu
    Chen, Jiawei
    Liu, Haohe
    Cong, Jian
    Zhang, Chen
    Liu, Yanqing
    Wang, Xi
    Leng, Yichong
    Yi, Yuanhao
    He, Lei
    Zhao, Sheng
    Qin, Tao
    Soong, Frank
    Liu, Tie-Yan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4234 - 4245
  • [9] End-to-End Text-To-Speech synthesis for under resourced South African languages
    Nthite, Thapelo
    Tsoeu, Mohohlo
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 684 - 689
  • [10] Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
    Ahmad, Hawraz A.
    Rashid, Tarik A.
    ALGORITHMS, 2024, 17 (07)