High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks

被引:0
|
作者
Yoneyama, Reo [1 ]
Wu, Yi-Chiao [1 ]
Toda, Tomoki [2 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Nagoya 4648601, Japan
[2] Nagoya Univ, Informat Technol Ctr, Nagoya 4648601, Japan
基金
日本学术振兴会;
关键词
Vocoders; Controllability; Speech processing; Neural networks; Training; Mathematical models; Acoustics; Speech synthesis; neural vocoder; source-filter model; unified source-filter networks; WAVE-FORM GENERATION; SPEECH SYNTHESIS; MODEL;
D O I
10.1109/TASLP.2023.3313410
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce unified source-filter generative adversarial networks (uSFGAN), a waveform generative model conditioned on acoustic features, which represents the source-filter architecture in a generator network. Unlike the previous neural-based source-filter models in which parametric signal process modules are combined with neural networks, our approach enables unified optimization of both the source excitation generation and resonance filtering parts to achieve higher sound quality. In the uSFGAN framework, several specific regularization losses are proposed to enable the source excitation generation part to output reasonable source excitation signals. Both objective and subjective experiments are conducted, and the results demonstrate that the proposed uSFGAN achieves comparable sound quality to HiFi-GAN in the speech reconstruction task and outperforms WORLD in the F-0 transformation task. Moreover, we argue that the F-0-driven mechanism and the inductive bias obtained by source-filter modeling improve the robustness against unseen F-0 in training as shown by the results of experimental evaluations. Audio samples are available at our demo site at https://chomeyama.github.io/PitchControllableNeuralVocoder-Demo/.
引用
收藏
页码:3717 / 3729
页数:13
相关论文
共 50 条
  • [1] A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules
    Yang, Runxuan
    Peng, Yuyang
    Hu, Xiaolin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3362 - 3373
  • [2] Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN
    Yoneyama, Reo
    Wu, Yi-Chiao
    Toda, Tomoki
    INTERSPEECH 2021, 2021, : 2187 - 2191
  • [3] Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
    Song, Kun
    Cong, Jian
    Wang, Xinsheng
    Zhang, Yongmao
    Xie, Lei
    Jiang, Ning
    Wu, Haiying
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 71 - 75
  • [4] High-fidelity filter based on medians for chemograms
    Miao, HJ
    JOURNAL OF CHROMATOGRAPHIC SCIENCE, 1997, 35 (09) : 435 - 438
  • [5] FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction
    Tian, Qiao
    Zhang, Zewang
    Lu, Heng
    Chen, Ling-Hui
    Liu, Shan
    INTERSPEECH 2020, 2020, : 195 - 199
  • [6] Reverberation Modeling for Source-Filter-based Neural Vocoder
    Ai, Yang
    Wang, Xin
    Yamagishi, Junichi
    Ling, Zhen-Hua
    INTERSPEECH 2020, 2020, : 3560 - 3564
  • [7] HIGH-FIDELITY DIGITAL-FILTER BASED ON MEDIANS
    MIAO, HJ
    HU, SX
    CHEMICAL JOURNAL OF CHINESE UNIVERSITIES-CHINESE, 1995, 16 (07): : 1020 - 1023
  • [8] Unambiguous and High-Fidelity Backdoor Watermarking for Deep Neural Networks
    Hua, Guang
    Teoh, Andrew Beng Jin
    Xiang, Yong
    Jiang, Hao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11204 - 11217
  • [9] Fast, High-fidelity Lyα Forests with Convolutional Neural Networks
    Harrington, Peter
    Mustafa, Mustafa
    Dornfest, Max
    Horowitz, Benjamin
    Lukic, Zarija
    ASTROPHYSICAL JOURNAL, 2022, 929 (02):
  • [10] High-fidelity positioning and tracking of AGVs based on Kalman filter
    Zhang, Wei
    Wang, Wenjie
    Gong, Liang
    Liu, Chengliang
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 2897 - 2901