IMPROVING GAN-BASED VOCODER FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS

被引:0
|
作者
He, Mengnan [1 ]
Guo, Tingwei [1 ]
Lu, Zhengxin [1 ]
Zhang, Ruixiong [1 ]
Gong, Caixia [1 ]
机构
[1] DiDi Chuxing, Beijing, Peoples R China
来源
关键词
neural vocoder; Shuffle-Residual Block; Frequency Transformation Block; speech synthesis;
D O I
10.21437/Interspeech.2022-730
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Following tremendous success in the Generative Adversarial Network(GAN), the GAN-based vocoders have recently shown much faster speed in waveform generation. However, the quality of generated speech is slightly inferior, and the real-time factor (RTF) still can't be satisfied in many devices with limited resources. To address the issues, we propose a new GAN-based vocoder model. Firstly, we introduce the Shuffle-Residual Block into the generator to get a lower RTF. Secondly, we propose a Frequency Transformation Block in the discriminator to capture the correlation between different frequency bins in every frame. To the best of our knowledge, our model achieves the lowest RTF of the GAN-based vocoders under the premise of ensuring the speech quality. In our experiments, our model shows a lower RTF with more than 40% improvement and higher speech quality than MB-MelGAN and HiFi-GAN V2.
引用
收藏
页码:1601 / 1605
页数:5
相关论文
共 50 条
  • [1] WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications
    Morise, Masanori
    Yokomori, Fumiya
    Ozawa, Kenji
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (07): : 1877 - 1884
  • [2] GAN-Based High-Quality Face-Swapping Composite Network
    Man, Qiaoyue
    Cho, Young-Im
    Gee, Seok-Jeong
    Kim, Woo-Je
    Jang, Kyoung-Ae
    [J]. ELECTRONICS, 2024, 13 (15)
  • [3] GAN-based Vision Transformer for High-Quality Thermal Image Enhancement
    Marnissi, Mohamed Amine
    Fathallah, Abir
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 817 - 825
  • [4] HIGH-QUALITY CHANNEL VOCODER
    LARKIN, WD
    STEWART, LC
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (01): : 107 - &
  • [5] Improving Speech Recognition using GAN-based Speech Synthesis and Contrastive Unspoken Text Selection
    Chen, Zhehuai
    Rosenberg, Andrew
    Zhang, Yu
    Wang, Gary
    Ramabhadran, Bhuvana
    Moreno, Pedro J.
    [J]. INTERSPEECH 2020, 2020, : 556 - 560
  • [6] STRAIGHT: An extremely high-quality VOCODER for auditory and speech perception research
    Kawahara, H
    [J]. COMPUTATIONAL MODELS OF AUDITORY FUNCTION, 2001, 312 : 343 - 354
  • [7] Detail Fusion GAN: High-Quality Translation for Unpaired Images with GAN-based Data Augmentation
    Li, Ling
    Li, Yaochen
    Wu, Chuan
    Dong, Hang
    Jiang, Peilin
    Wang, Fei
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1731 - 1736
  • [8] A Dual GAN-Based Method for Augmenting High-Quality Rice Leaf Disease Images
    Vijayalakshmi, K.
    Sreenivasulu, K.
    Sandhya, M.
    Khaleelbasha, G.
    Naresh, M. Venkata
    [J]. 2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [9] Improved performance of GaN-based light-emitting diodes with high-quality GaN grown on InN islands
    Lee, Sang-Jun
    Cho, Chu-Young
    Hong, Sang-Hyun
    Han, Sang-Heon
    Yoon, Sukho
    Park, Yongjo
    Park, Seong-Ju
    [J]. JOURNAL OF PHYSICS D-APPLIED PHYSICS, 2011, 44 (42)
  • [10] VOCODER AND ITS APPLICATION TO THE TRANSMISSION OF HIGH-QUALITY SPEECH OVER NARROW-BAND CHANNELS
    SCHROEDER, MR
    DAVID, EE
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1959, 31 (01): : 113 - 113