Continuous vocoder applied in deep neural network based voice conversion

被引:0
|
作者
Mohammed Salah Al-Radhi
Tamás Gábor Csapó
Géza Németh
机构
[1] Budapest University of Technology and Economics,Department of Telecommunications and Media Informatics
[2] MTA-ELTE Lendület Lingual Articulation Research Group,undefined
来源
关键词
Voice conversion; Continuous vocoder,∙ neural network; Speech synthesis;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.
引用
收藏
页码:33549 / 33572
页数:23
相关论文
共 50 条
  • [1] Continuous vocoder applied in deep neural network based voice conversion
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33549 - 33572
  • [2] An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9 (01)
  • [3] VOICE CONVERSION WITH CYCLIC RECURRENT NEURAL NETWORK AND FINE-TUNED WAVENET VOCODER
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6815 - 6819
  • [4] AN EVALUATION OF DEEP SPECTRAL MAPPINGS AND WAVENET VOCODER FOR VOICE CONVERSION
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 297 - 303
  • [5] Denoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice Conversion
    Wu, Jie
    Huang, Dongyan
    Xie, Lei
    Li, Haizhou
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3379 - 3383
  • [6] Deep Neural Network based Voice Conversion with A Large Synthesized Parallel Corpus
    Wen, Zhengqi
    Li, Kehuang
    Tao, Jianhua
    Lee, Chin-Hui
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [7] Voice Conversion System Based on Deep Neural Network Capable of Parallel Computation
    Sato, Kunihiko
    Rekimoto, Jun
    [J]. 25TH 2018 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2018, : 677 - 678
  • [8] Pitch Transformation in Neural Network based Voice Conversion
    Xie, Feng-Long
    Qian, Yao
    Soong, Frank K.
    Li, Haifeng
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 197 - +
  • [9] Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Hwang, Hsin-Te
    Tobing, Patrick Lumban
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    Tsao, Yu
    Wang, Hsin-Min
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [10] Vocoder-free End-to-End Voice Conversion with Transformer Network
    Kim, June-Woo
    Jung, Ho-Young
    Lee, Minho
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,