STATISTICAL VOICE CONVERSION BASED ON WAVENET

被引:0
|
作者
Niwa, Jumpei [1 ]
Yoshimura, Takenori [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi, Japan
关键词
Voice conversion; WaveNet; Deep Neural Network; statistical model;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a voice conversion technique based on WaveNet to directly generate target audio waveforms from acoustic features of a source speaker. In voice conversion based on statistical models, the relation between acoustic features, such as spectral parameters, extracted from source and target audio waveforms is generally modeled using statistical models, such as Gaussian mixture models and neural networks. Although modeling the relation between acoustic features is reasonable and efficient, these models are not optimized for predicting target audio waveforms because the vocoder parameters are used as intermediate representations. To overcome this problem, we developed a voice conversion method to model the relation between target audio waveforms and acoustic features extracted from source audio waveforms using WaveNet, which is a generative model for audio waveforms. The proposed model can directly generate converted audio waveforms without vocoders. Experimental results indicate that the proposed method can generate a more naturally sounding converted speech than that using a conventional DNN method.
引用
收藏
页码:5289 / 5293
页数:5
相关论文
共 50 条
  • [1] Statistical voice conversion with WaveNet-based waveform generation
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Tamamori, Akira
    Toda, Tomoki
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1138 - 1142
  • [2] Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Hwang, Hsin-Te
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [3] ATTENTION-BASED WAVENET AUTOENCODER FOR UNIVERSAL VOICE CONVERSION
    Polyak, Adam
    Wolf, Lior
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6800 - 6804
  • [4] Factorized WaveNet for voice conversion with limited data
    Du, Hongqiang
    Tian, Xiaohai
    Xie, Lei
    Li, Haizhou
    [J]. SPEECH COMMUNICATION, 2021, 130 : 45 - 54
  • [5] WaveNet Vocoder with Limited Training Data for Voice Conversion
    Liu, Li-Juan
    Ling, Zhen-Hua
    Yuan-Jiang
    Ming-Zhou
    Dai, Li-Rong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1983 - 1987
  • [6] EFFECTIVE WAVENET ADAPTATION FOR VOICE CONVERSION WITH LIMITED DATA
    Du, Hongqiang
    Tian, Xiaohai
    Xie, Lei
    Li, Haizhou
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7779 - 7783
  • [7] ADAPTIVE WAVENET VOCODER FOR RESIDUAL COMPENSATION IN GAN-BASED VOICE CONVERSION
    Sisman, Berrak
    Zhang, Mingyang
    Sakti, Sakriani
    Li, Haizhou
    Nakamura, Satoshi
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 282 - 289
  • [8] WAVENET FACTORIZATION WITH SINGULAR VALUE DECOMPOSITION FOR VOICE CONVERSION
    Du, Hongqiang
    Tian, Xiaohai
    Xie, Lei
    Li, Haizhou
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 152 - 159
  • [9] Voice Conversion With CycleRNN-Based Spectral Mapping and Finely Tuned WaveNet Vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. IEEE ACCESS, 2019, 7 : 171114 - 171125
  • [10] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
    Chen, Kuan
    Chen, Bo
    Lai, Jiahao
    Yu, Kai
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997