ADAPTIVE WAVENET VOCODER FOR RESIDUAL COMPENSATION IN GAN-BASED VOICE CONVERSION

被引:0
|
作者
Sisman, Berrak [1 ,2 ,3 ]
Zhang, Mingyang [1 ]
Sakti, Sakriani [2 ,3 ]
Li, Haizhou [1 ]
Nakamura, Satoshi [2 ,3 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Nara Inst Sci & Technol, Nara, Japan
[3] RIKEN, Ctr Adv Intelligence Project AIP, Tokyo, Japan
关键词
voice conversion; generative adversarial networks; adaptive Wavenet; residual compensation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose to use generative adversarial networks (GAN) together with a WaveNet vocoder to address the over-smoothing problem arising from the deep learning approaches to voice conversion, and to improve the vocoding quality over the traditional vocoders. As GAN aims to minimize the divergence between the natural and converted speech parameters, it effectively alleviates the over-smoothing problem in the converted speech. On the other hand, WaveNet vocoder allows us to leverage from the human speech of a large speaker population, thus improving the naturalness of the synthetic voice. Furthermore, for the first time, we study how to use WaveNet vocoder for residual compensation to improve the voice conversion performance. The experiments show that the proposed voice conversion framework consistently outperforms the baselines.
引用
收藏
页码:282 / 289
页数:8
相关论文
共 50 条
  • [1] Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Hwang, Hsin-Te
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [2] WaveNet Vocoder with Limited Training Data for Voice Conversion
    Liu, Li-Juan
    Ling, Zhen-Hua
    Yuan-Jiang
    Ming-Zhou
    Dai, Li-Rong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1983 - 1987
  • [3] AN EVALUATION OF DEEP SPECTRAL MAPPINGS AND WAVENET VOCODER FOR VOICE CONVERSION
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 297 - 303
  • [4] Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. IEEE ACCESS, 2019, 7 : 171114 - 171125
  • [5] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
    Chen, Kuan
    Chen, Bo
    Lai, Jiahao
    Yu, Kai
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997
  • [6] An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9
  • [7] STATISTICAL VOICE CONVERSION BASED ON WAVENET
    Niwa, Jumpei
    Yoshimura, Takenori
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5289 - 5293
  • [8] VOICE CONVERSION WITH CYCLIC RECURRENT NEURAL NETWORK AND FINE-TUNED WAVENET VOCODER
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6815 - 6819
  • [9] Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Toda, Tomoki
    [J]. IEEE ACCESS, 2020, 8 : 62094 - 62106
  • [10] A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder
    Sisman, Berrak
    Zhang, Mingyang
    Li, Haizhou
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1978 - 1982