Improving Generative Adversarial Network-based Vocoding through Multi-scale Convolution

被引:0
|
作者
Li, Wanting [1 ]
Chen, Yiting [1 ]
Tang, Buzhou [2 ,3 ]
机构
[1] Harbin Inst Technol Shenzhen, Shenzhen, Guangdong, Peoples R China
[2] Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China
[3] Pengcheng Lab, Shenzhen, Peoples R China
关键词
Speech generation; neural vocoder; SPEECH SYNTHESIS;
D O I
10.1145/3610532
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vocoding is a sub-process of text-to-speech task, which aims at generating audios from intermediate representations between text and audio. Several recent works have shown that generative adversarial network(GAN) based vocoders can generate audios with high quality. While GAN-based neural vocoders have shown higher efficiency in generating speed than autoregressive vocoders, the audio fidelity still cannot compete with ground-truth samples. One major cause of the degradation in audio quality and spectrogram vague comes from the average pooling layers in discriminator. As the multi-scale discriminator commonly used by recent GAN-based vocoders applies several average pooling layers to capture different-frequency bands, we believe it is crucial to prevent the high-frequency information from leakage in the average pooling process. This article proposesMSCGAN, which solves the above-mentioned problem and achieves higher-fidelity speech synthesis. We demonstrate that substituting the average pooling process with a multi-scale convolution architecture effectively retains high-frequency features and thus forces the generator to recover audio details in time and frequency domain. Compared with other state-of-the-art GAN-based vocoders, MSCGAN can produce competitive audio with a higher spectrogram clarity and mean opinion score score in subjective human evaluation.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Multi-scale multi-class conditional generative adversarial network for handwritten character generation
    Jin Liu
    Chenkai Gu
    Jin Wang
    Geumran Youn
    Jeong-Uk Kim
    The Journal of Supercomputing, 2019, 75 : 1922 - 1940
  • [32] Multi-Scale Generative Adversarial Network With Multi-Head External Attention for Image Inpainting
    Chen, Gang
    Feng, Qing
    He, Xiu
    Yao, Jian
    IEEE ACCESS, 2024, 12 : 133456 - 133468
  • [33] Classification and detection method of blood cells images based on multi-scale conditional generative adversarial network
    Chen X.-Y.
    Huang X.-Q.
    Xie L.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2021, 55 (09): : 1772 - 1781
  • [34] Attention-Based Multi-Scale Generative Adversarial Network for synthesizing contrast-enhanced MRI
    Pan, Meiqing
    Zhang, Hui
    Tang, Zhenchao
    Zhao, Yinghua
    Tian, Jie
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3650 - 3653
  • [35] Multi-scale generative adversarial inpainting network based on cross-layer attention transfer mechanism
    Shao, Mingwen
    Zhang, Wentao
    Zuo, Wangmeng
    Meng, Deyu
    KNOWLEDGE-BASED SYSTEMS, 2020, 196 (196)
  • [36] An inverse halftoning method for various types of halftone images based on multi-scale generative adversarial network
    Zhang, Erhu
    Li, Mei
    Zhang, Qing
    Wu, Lele
    Shao, Linhao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [37] IMPROVING THE VISUAL QUALITY OF GENERATIVE ADVERSARIAL NETWORK (GAN)-GENERATED IMAGES USING THE MULTI-SCALE STRUCTURAL SIMILARITY INDEX
    Kancharla, Parimala
    Channappayya, Sumohana S.
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3908 - 3912
  • [38] Generative Adversarial Network-Based Frame Interpolation with Multi-Perspective Discrimination
    Quang Nhat Tran
    Yang, Shih-Hsuan
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 800 - 805
  • [39] Multi-Scale Feature Channel Attention Generative Adversarial Network for Face Sketch Synthesis
    Zheng, Jieying
    Wu, Yahong
    Song, Wanru
    Xu, Ran
    Liu, Feng
    IEEE ACCESS, 2020, 8 : 146754 - 146769
  • [40] Multi-scale self-attention generative adversarial network for pathology image restoration
    Meiyan Liang
    Qiannan Zhang
    Guogang Wang
    Na Xu
    Lin Wang
    Haishun Liu
    Cunlin Zhang
    The Visual Computer, 2023, 39 : 4305 - 4321