FLOW-TTS: A NON-AUTOREGRESSIVE NETWORK FOR TEXT TO SPEECH BASED ON FLOW

被引:0
|
作者
Miao, Chenfeng [1 ]
Liang, Shuang [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol, Shenzhen, Peoples R China
关键词
Text to speech; Non-autoregressive; Generative flow;
D O I
10.1109/icassp40776.2020.9054484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose Flow-TTS, a non-autoregressive end-to-end neural TTS model based on generative flow. Unlike other non-autoregressive models, Flow-TTS can achieve high-quality speech generation by using a single feed-forward network. To our knowledge, Flow-TTS is the first TTS model utilizing flow in spectrogram generation network and the first non-autoregssive model which jointly learns the alignment and spectrogram generation through a single network. Experiments on LJSpeech show that the speech quality of Flow-TTS heavily approaches that of human and is even better than that of autoregressive model Tacotron 2 (outperforms Tacotron 2 with a gap of 0.09 in MOS). Meanwhile, the inference speed of Flow-TTS is about 23 times speed-up over Tacotron 2, which is comparable to FastSpeech.(1)
引用
下载
收藏
页码:7209 / 7213
页数:5
相关论文
共 50 条
  • [1] VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis
    Lu, Hui
    Wu, Zhiyong
    Wu, Xixin
    Li, Xu
    Kang, Shiyin
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 3775 - 3779
  • [2] FCH-TTS: Fast, Controllable and High-quality Non-Autoregressive Text-to-Speech Synthesis
    Zhou, Xun
    Zhou, Zhiyang
    Shi, Xiaodong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [3] PARALLEL TACOTRON: NON-AUTOREGRESSIVE AND CONTROLLABLE TTS
    Elias, Isaac
    Zen, Heiga
    Shen, Jonathan
    Zhang, Yu
    Jia, Ye
    Weiss, Ron J.
    Wu, Yonghui
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5709 - 5713
  • [4] Estonian Text-to-Speech Synthesis with Non-autoregressive Transformers
    Ratsep, Liisa
    Lellep, Rasmus
    Fishel, Mark
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 447 - 456
  • [5] MIXER-TTS: NON-AUTOREGRESSIVE, FAST AND COMPACT TEXT-TO-SPEECH MODEL CONDITIONED ON LANGUAGE MODEL EMBEDDINGS
    Tatanov, Oktai
    Beliaev, Stanislav
    Ginsburg, Boris
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7482 - 7486
  • [6] A COMPARATIVE STUDY ON NON-AUTOREGRESSIVE MODELINGS FOR SPEECH-TO-TEXT GENERATION
    Higuchi, Yosuke
    Chen, Nanxin
    Fujita, Yuya
    Inaguma, Hirofumi
    Komatsu, Tatsuya
    Lee, Jaesong
    Nozaki, Jumon
    Wang, Tianzi
    Watanabe, Shinji
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 47 - 54
  • [7] Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
    Bae, Jae-Sung
    Bak, Tae-Jun
    Joo, Young-Sun
    Cho, Hoon-Young
    INTERSPEECH 2021, 2021, : 3610 - 3614
  • [8] CTC-based Non-autoregressive Speech Translation
    Xu, Chen
    Liu, Xiaoqian
    Liu, Xiaowen
    Sun, Qingxuan
    Zhang, Yuhao
    Yang, Murun
    Dong, Qianqian
    Ko, Tom
    Wang, Mingxuan
    Xiao, Tong
    Ma, Anxiang
    Zhu, Jingbo
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13321 - 13339
  • [9] Non-Autoregressive Transformer for Speech Recognition
    Chen, Nanxin
    Watanabe, Shinji
    Villalba, Jesus
    Zelasko, Piotr
    Dehak, Najim
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 121 - 125
  • [10] LIGHTSPEECH: LIGHTWEIGHT NON-AUTOREGRESSIVE MULTI-SPEAKER TEXT-TO-SPEECH
    Li, Song
    Ouyang, Beibei
    Li, Lin
    Hong, Qingyang
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 499 - 506