Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

被引:32
|
作者
Huang, Wen-Chin [1 ]
Wu, Yi-Chiao [2 ]
Hwang, Hsin-Te [1 ]
Tobing, Patrick Lumban [2 ]
Hayashi, Tomoki [2 ]
Kobayashi, Kazuhiro [2 ]
Toda, Tomoki [2 ]
Tsao, Yu [1 ]
Wang, Hsin-Min [1 ]
机构
[1] Acad Sinica, Taipei, Taiwan
[2] Nagoya Univ, Nagoya, Aichi, Japan
关键词
voice conversion; variational autoencoder; WaveNet vocoder; speaker adaptation; NEURAL-NETWORKS;
D O I
10.23919/eusipco.2019.8902651
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a refinement framework of WaveNet vocoders for variational autoencoder (VAE) based voice conversion (VC), which reduces the quality distortion caused by the mismatch between the training data and testing data. Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation. In this work, we take advantage of the particular structure of VAEs to refine WaveNet vocoders with the self-reconstructed features generated by VAE, which are of similar characteristics with the converted features while having the same temporal structure with the target natural features. We analyze these features and show that the self-reconstructed features are similar to the converted features. Objective and subjective experimental results demonstrate the effectiveness of our proposed framework.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] WaveNet Vocoder with Limited Training Data for Voice Conversion
    Liu, Li-Juan
    Ling, Zhen-Hua
    Yuan-Jiang
    Ming-Zhou
    Dai, Li-Rong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1983 - 1987
  • [2] ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION
    Polyak, Adam
    Wolf, Lior
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6800 - 6804
  • [3] ADAPTIVE WAVENET VOCODER FOR RESIDUAL COMPENSATION IN GAN-BASED VOICE CONVERSION
    Sisman, Berrak
    Zhang, Mingyang
    Sakti, Sakriani
    Li, Haizhou
    Nakamura, Satoshi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 282 - 289
  • [4] Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder
    Nakatani, Hikaru
    Tobing, Patrick Lumban
    Takeda, Kazuya
    Toda, Tomoki
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 520 - 526
  • [5] AN EVALUATION OF DEEP SPECTRAL MAPPINGS AND WAVENET VOCODER FOR VOICE CONVERSION
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 297 - 303
  • [6] Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. IEEE ACCESS, 2019, 7 : 171114 - 171125
  • [7] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
    Chen, Kuan
    Chen, Bo
    Lai, Jiahao
    Yu, Kai
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997
  • [8] An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9 (01)
  • [9] STATISTICAL VOICE CONVERSION BASED ON WAVENET
    Niwa, Jumpei
    Yoshimura, Takenori
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5289 - 5293
  • [10] VOICE CONVERSION WITH CYCLIC RECURRENT NEURAL NETWORK AND FINE-TUNED WAVENET VOCODER
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6815 - 6819