FLOW-TTS: A NON-AUTOREGRESSIVE NETWORK FOR TEXT TO SPEECH BASED ON FLOW

被引:0
|
作者
Miao, Chenfeng [1 ]
Liang, Shuang [1 ]
Chen, Minchuan [1 ]
Ma, Jun [1 ]
Wang, Shaojun [1 ]
Xiao, Jing [1 ]
机构
[1] Ping An Technol, Shenzhen, Peoples R China
关键词
Text to speech; Non-autoregressive; Generative flow;
D O I
10.1109/icassp40776.2020.9054484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose Flow-TTS, a non-autoregressive end-to-end neural TTS model based on generative flow. Unlike other non-autoregressive models, Flow-TTS can achieve high-quality speech generation by using a single feed-forward network. To our knowledge, Flow-TTS is the first TTS model utilizing flow in spectrogram generation network and the first non-autoregssive model which jointly learns the alignment and spectrogram generation through a single network. Experiments on LJSpeech show that the speech quality of Flow-TTS heavily approaches that of human and is even better than that of autoregressive model Tacotron 2 (outperforms Tacotron 2 with a gap of 0.09 in MOS). Meanwhile, the inference speed of Flow-TTS is about 23 times speed-up over Tacotron 2, which is comparable to FastSpeech.(1)
引用
下载
收藏
页码:7209 / 7213
页数:5
相关论文
共 50 条
  • [31] A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
    Guo, Haohan
    Lu, Hui
    Wu, Xixin
    Meng, Helen
    INTERSPEECH 2022, 2022, : 1566 - 1570
  • [32] Non-Autoregressive Text Generation with Pre-trained Language Models
    Su, Yixuan
    Cai, Deng
    Wang, Yan
    Vandyke, David
    Baker, Simon
    Li, Piji
    Collier, Nigel
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 234 - 243
  • [33] An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition
    Fan, Ruchao
    Chu, Wei
    Chang, Peng
    Xiao, Jing
    Alwan, Abeer
    INTERSPEECH 2021, 2021, : 3715 - 3719
  • [34] Non-Autoregressive Fully Parallel Deep Convolutional Neural Speech Synthesis
    Lee, Moa
    Lee, Junmo
    Chang, Joon-Hyuk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1150 - 1159
  • [35] NON-AUTOREGRESSIVE TRANSFORMER WITH UNIFIED BIDIRECTIONAL DECODER FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Chuan-Fei
    Liu, Yan
    Zhang, Tian-Hao
    Chen, Song-Lu
    Chen, Feng
    Yin, Xu-Cheng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6527 - 6531
  • [36] Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
    Kim, Jaehyeon
    Kim, Sungwon
    Kong, Jungil
    Yoon, Sungroh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [37] NAST: A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer
    Huang, Fei
    Chen, Zikai
    Wu, Chen Henry
    Guo, Qihan
    Zhu, Xiaoyan
    Huang, Minlie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1577 - 1590
  • [38] Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies
    Liu, Alexander H.
    Chung, Yu-An
    Glass, James
    INTERSPEECH 2021, 2021, : 3730 - 3734
  • [39] NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Chuang, Shun-Po
    Chang, Heng-Jui
    Huang, Sung-Feng
    Lee, Hung-yi
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 465 - 472
  • [40] Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation
    Chuang, Shun-Po
    Chuang, Yung-Sung
    Chang, Chih-Chiang
    Lee, Hung-yi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1068 - 1077