FLOW-TTS: A NON-AUTOREGRESSIVE NETWORK FOR TEXT TO SPEECH BASED ON FLOW

被引:0
|
作者
Miao, Chenfeng [1 ]
Liang, Shuang [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol, Shenzhen, Peoples R China
关键词
Text to speech; Non-autoregressive; Generative flow;
D O I
10.1109/icassp40776.2020.9054484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose Flow-TTS, a non-autoregressive end-to-end neural TTS model based on generative flow. Unlike other non-autoregressive models, Flow-TTS can achieve high-quality speech generation by using a single feed-forward network. To our knowledge, Flow-TTS is the first TTS model utilizing flow in spectrogram generation network and the first non-autoregssive model which jointly learns the alignment and spectrogram generation through a single network. Experiments on LJSpeech show that the speech quality of Flow-TTS heavily approaches that of human and is even better than that of autoregressive model Tacotron 2 (outperforms Tacotron 2 with a gap of 0.09 in MOS). Meanwhile, the inference speed of Flow-TTS is about 23 times speed-up over Tacotron 2, which is comparable to FastSpeech.(1)
引用
下载
收藏
页码:7209 / 7213
页数:5
相关论文
共 50 条
  • [41] Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
    Chi, Ethan A.
    Salazar, Julian
    Kirchhoff, Katrin
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1920 - 1927
  • [42] Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition
    Chen, Nanxin
    Zelasko, Piotr
    Moro-Velazquez, Laureano
    Villalba, Jesus
    Dehak, Najim
    INTERSPEECH 2021, 2021, : 3770 - 3774
  • [43] A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition
    Fan, Ruchao
    Chu, Wei
    Chang, Peng
    Alwan, Abeer
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1436 - 1448
  • [44] FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis
    Wang, Yongqi
    Zhao, Zhou
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5678 - 5687
  • [45] Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
    Tian, Zhengkun
    Yi, Jiangyan
    Tao, Jianhua
    Bai, Ye
    Zhang, Shuai
    Wen, Zhengqi
    INTERSPEECH 2020, 2020, : 5026 - 5030
  • [46] Hint-Based Training for Non-Autoregressive Machine Translation
    Li, Zhuohan
    Lin, Zi
    He, Di
    Tian, Fei
    Qin, Tao
    Wang, Liwei
    Liu, Tie-Yan
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5708 - 5713
  • [47] Design of The Speech Synthesis System from Text to Speech Based on the TTS Technique
    Guo-hong, Gao
    Xue-yong, Li
    Jin-na, Lv
    2010 SECOND ETP/IITA WORLD CONGRESS IN APPLIED COMPUTING, COMPUTER SCIENCE, AND COMPUTER ENGINEERING, 2010, : 172 - 174
  • [48] Maximal Clique Based Non-Autoregressive Open Information Extraction
    Yu, Bowen
    Wang, Yucheng
    Liu, Tingwen
    Zhu, Hongsong
    Sun, Limin
    Wang, Bin
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9696 - 9706
  • [49] TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis
    Beliaev, Stanislav
    Ginsburg, Boris
    INTERSPEECH 2021, 2021, : 3760 - 3764
  • [50] ORTHROS: NON-AUTOREGRESSIVE END-TO-END SPEECH TRANSLATION WITH DUAL-DECODER
    Inaguma, Hirofumi
    Higuchi, Yosuke
    Duh, Kevin
    Kawahara, Tatsuya
    Watanabe, Shinji
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7503 - 7507