IMPROVING GAN-BASED VOCODER FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS

被引：0

作者：

He, Mengnan ^{[1
]}

Guo, Tingwei ^{[1
]}

Lu, Zhengxin ^{[1
]}

Zhang, Ruixiong ^{[1
]}

Gong, Caixia ^{[1
]}

机构：

[1] DiDi Chuxing, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

neural vocoder; Shuffle-Residual Block; Frequency Transformation Block; speech synthesis;

D O I：

10.21437/Interspeech.2022-730

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Following tremendous success in the Generative Adversarial Network(GAN), the GAN-based vocoders have recently shown much faster speed in waveform generation. However, the quality of generated speech is slightly inferior, and the real-time factor (RTF) still can't be satisfied in many devices with limited resources. To address the issues, we propose a new GAN-based vocoder model. Firstly, we introduce the Shuffle-Residual Block into the generator to get a lower RTF. Secondly, we propose a Frequency Transformation Block in the discriminator to capture the correlation between different frequency bins in every frame. To the best of our knowledge, our model achieves the lowest RTF of the GAN-based vocoders under the premise of ensuring the speech quality. In our experiments, our model shows a lower RTF with more than 40% improvement and higher speech quality than MB-MelGAN and HiFi-GAN V2.

引用

页码：1601 / 1605

页数：5

共 50 条

[1] WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications
Morise, Masanori
Yokomori, Fumiya
Ozawa, Kenji
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (07): : 1877 - 1884
[2] GAN-Based High-Quality Face-Swapping Composite Network
Man, Qiaoyue
Cho, Young-Im
Gee, Seok-Jeong
Kim, Woo-Je
Jang, Kyoung-Ae
[J]. ELECTRONICS, 2024, 13 (15)
[3] GAN-based Vision Transformer for High-Quality Thermal Image Enhancement
Marnissi, Mohamed Amine
Fathallah, Abir
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 817 - 825
[4] HIGH-QUALITY CHANNEL VOCODER
LARKIN, WD
STEWART, LC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (01): : 107 - &
[5] Improving Speech Recognition using GAN-based Speech Synthesis and Contrastive Unspoken Text Selection
Chen, Zhehuai
Rosenberg, Andrew
Zhang, Yu
Wang, Gary
Ramabhadran, Bhuvana
Moreno, Pedro J.
[J]. INTERSPEECH 2020, 2020, : 556 - 560
[6] STRAIGHT: An extremely high-quality VOCODER for auditory and speech perception research
Kawahara, H
[J]. COMPUTATIONAL MODELS OF AUDITORY FUNCTION, 2001, 312 : 343 - 354
[7] Detail Fusion GAN: High-Quality Translation for Unpaired Images with GAN-based Data Augmentation
Li, Ling
Li, Yaochen
Wu, Chuan
Dong, Hang
Jiang, Peilin
Wang, Fei
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1731 - 1736
[8] A Dual GAN-Based Method for Augmenting High-Quality Rice Leaf Disease Images
Vijayalakshmi, K.
Sreenivasulu, K.
Sandhya, M.
Khaleelbasha, G.
Naresh, M. Venkata
[J]. 2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
[9] Improved performance of GaN-based light-emitting diodes with high-quality GaN grown on InN islands
Lee, Sang-Jun
Cho, Chu-Young
Hong, Sang-Hyun
Han, Sang-Heon
Yoon, Sukho
Park, Yongjo
Park, Seong-Ju
[J]. JOURNAL OF PHYSICS D-APPLIED PHYSICS, 2011, 44 (42)
[10] VOCODER AND ITS APPLICATION TO THE TRANSMISSION OF HIGH-QUALITY SPEECH OVER NARROW-BAND CHANNELS
SCHROEDER, MR
DAVID, EE
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1959, 31 (01): : 113 - 113

← 1 2 3 4 5 →