EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

被引:0
|
作者
Miao, Chenfeng [1 ]
Liang, Shuang [1 ]
Liu, Zhencheng [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we address the Text-to-Speech (TTS) task by proposing a non-autoregressive architecture called EfficientTTS. Unlike the dominant non-autoregressive TTS models, which are trained with the need of external aligners, EfficientTTS optimizes all its parameters with a stable, end-to-end training procedure, allowing for synthesizing high quality speech in a fast and efficient manner. EfficientTTS is motivated by a new monotonic alignment modeling approach, which specifies monotonic constraints to the sequence alignment with almost no increase of computation. By combining EfficientTTS with different feed-forward network structures, we develop a family of TTS models, including both text-to-melspectrogram and text-to-waveform networks. We experimentally show that the proposed models significantly outperform counterpart models such as Tacotron 2 (Shen et al.) and Glow-TTS (Kiln et al., 2020) in terms of speech quality, training efficiency and synthesis speed, while still producing the speeches of strong robustness and great diversity. In addition, we demonstrate that proposed approach can be easily extended to autoregressive models such as Tacotron 2.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] CAMNet: A controllable acoustic model for efficient, expressive, high-quality text-to-speech
    Alvarez, Jesus Monge
    Francois, Holly
    Sung, Hosang
    Choi, Seungdo
    Jeong, Jonghoon
    Choo, Kihyun
    Min, Kyoungbo
    Park, Sangjun
    [J]. APPLIED ACOUSTICS, 2022, 186
  • [2] PortaSpeech: Portable and High-Quality Generative Text-to-Speech
    Ren, Yi
    Liu, Jinglin
    Zhao, Zhou
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] An Advanced NLP Framework for High-Quality Text-to-Speech Synthesis
    Ungurean, Catalin
    Burileanu, Dragos
    [J]. 2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
  • [4] High-Quality Prosody Generation in Mandarin Text-to-Speech System
    Guo, Qing
    Zhang, Jie
    Katae, Nobuyuki
    Yu, Hao
    [J]. FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2010, 46 (01): : 40 - 46
  • [5] ProDiff: Progressive Fast Diffusion Model for High-Quality Text-to-Speech
    Huang, Rongjie
    Zhao, Zhou
    Liu, Huadai
    Liu, Jinglin
    Cui, Chenye
    Ren, Yi
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2595 - 2605
  • [6] MULTI-BAND MELGAN: FASTERWAVEFORM GENERATION FOR HIGH-QUALITY TEXT-TO-SPEECH
    Yang, Geng
    Yang, Shan
    Liu, Kai
    Fang, Peng
    Chen, Wei
    Xie, Lei
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 492 - 498
  • [7] Parameter Generation Methods With Rich Context Models for High-Quality and Flexible Text-To-Speech Synthesis
    Takamichi, Shinnosuke
    Toda, Tomoki
    Shiga, Yoshinori
    Sakti, Sakriani
    Neubig, Graham
    Nakamura, Satoshi
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 239 - 250
  • [8] VARIANCEFLOW: HIGH-QUALITY AND CONTROLLABLE TEXT-TO-SPEECH USING VARIANCE INFORMATION VIA NORMALIZING FLOW
    Lee, Yoonhyung
    Yang, Jinhyeok
    Jung, Kyomin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7477 - 7481
  • [9] EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion
    Miao, Chenfeng
    Zhu, Qingying
    Chen, Minchuan
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1650 - 1661
  • [10] Efficient Incremental Text-to-Speech on GPUs
    Du, Muyang
    Liu, Chuan
    Qi, Jiaxing
    Lai, Junjie
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1422 - 1428