Improving Generative Adversarial Network-based Vocoding through Multi-scale Convolution

被引:0
|
作者
Li, Wanting [1 ]
Chen, Yiting [1 ]
Tang, Buzhou [2 ,3 ]
机构
[1] Harbin Inst Technol Shenzhen, Shenzhen, Guangdong, Peoples R China
[2] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China
[3] Pengcheng Lab, Shenzhen, Peoples R China
关键词
Speech generation; neural vocoder; SPEECH SYNTHESIS;
D O I
10.1145/3610532
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vocoding is a sub-process of text-to-speech task, which aims at generating audios from intermediate representations between text and audio. Several recent works have shown that generative adversarial network(GAN) based vocoders can generate audios with high quality. While GAN-based neural vocoders have shown higher efficiency in generating speed than autoregressive vocoders, the audio fidelity still cannot compete with ground-truth samples. One major cause of the degradation in audio quality and spectrogram vague comes from the average pooling layers in discriminator. As the multi-scale discriminator commonly used by recent GAN-based vocoders applies several average pooling layers to capture different-frequency bands, we believe it is crucial to prevent the high-frequency information from leakage in the average pooling process. This article proposesMSCGAN, which solves the above-mentioned problem and achieves higher-fidelity speech synthesis. We demonstrate that substituting the average pooling process with a multi-scale convolution architecture effectively retains high-frequency features and thus forces the generator to recover audio details in time and frequency domain. Compared with other state-of-the-art GAN-based vocoders, MSCGAN can produce competitive audio with a higher spectrogram clarity and mean opinion score score in subjective human evaluation.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A generative adversarial network with multi-scale convolution and dilated convolution res-network for OCT retinal image despeckling
    Yu, Xiaojun
    Li, Mingshuai
    Ge, Chenkun
    Shum, Perry Ping
    Chen, Jinna
    Liu, Linbo
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 80
  • [2] Multi-scale conditional reconstruction generative adversarial network
    Chen, Yanming
    Xu, Jiahao
    An, Zhulin
    Zhuang, Fuzhen
    IMAGE AND VISION COMPUTING, 2024, 141
  • [3] Multi-Focus Image Fusion Based on Multi-Scale Generative Adversarial Network
    Ma, Xiaole
    Wang, Zhihai
    Hu, Shaohai
    Kan, Shichao
    ENTROPY, 2022, 24 (05)
  • [4] Multi-scale Generative Adversarial Deblurring Network with Gradient Guidance
    Zhu, Jinxiu
    Xu, Xue
    Choi, Chang
    Su, Xin
    JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (02): : 243 - 255
  • [5] Multi-scale capsule generative adversarial network for snow removal
    Yang, Fei
    Zhang, Jialu
    Zhang, Qian
    IET COMPUTER VISION, 2021, 15 (07) : 474 - 486
  • [6] Generative Adversarial Network Based on Multi-scale Dense Feature Fusion for Image Dehazing
    Lian J.
    Chen S.
    Ding K.
    Li L.-H.
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2022, 43 (11): : 1591 - 1598
  • [7] Multi-Scale Attention Generative Adversarial Network for Medical Image Enhancement
    Zhong, Guojin
    Ding, Weiping
    Chen, Long
    Wang, Yingxu
    Yu, Yu-Feng
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1113 - 1125
  • [8] Underwater Image Translation via Multi-Scale Generative Adversarial Network
    Yang, Dongmei
    Zhang, Tianzi
    Li, Boquan
    Li, Menghao
    Chen, Weijing
    Li, Xiaoqing
    Wang, Xingmei
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (10)
  • [9] Multi-scale generative adversarial network for image super-resolution
    Daihong, Jiang
    Sai, Zhang
    Lei, Dai
    Yueming, Dai
    SOFT COMPUTING, 2022, 26 (08) : 3631 - 3641
  • [10] A MULTI-SCALE CONDITIONAL GENERATIVE ADVERSARIAL NETWORK FOR FACE SKETCH SYNTHESIS
    Bi, Hongbo
    Li, Ning
    Guan, Huaping
    Lu, Di
    Yang, Lina
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3876 - 3880