High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder

被引:0
|
作者
Chen, Kuan [1 ]
Chen, Bo [1 ]
Lai, Jiahao [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Brain Sci & Technol Res Ctr, SpeechLab,Dept Comp Sci & Engn, Shanghai, Peoples R China
关键词
voice conversion; WaveNet vocoder; mel-frequency spectrogram; LSTM-RNN; SYSTEM; TIME;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Waveform generator is a key component in voice conversion. Recently, WaveNet waveform generator conditioned on the Mel-cepstrum (Mcep) has shown better quality over standard vocoder. In this paper, an enhanced WaveNet model based on spectrogram is proposed to further improve voice conversion performance. Here, Mel-frequency spectrogram is converted from source speaker to target speaker using an LSTMRNN based frame-to-frame feature mapping. To evaluate the performance, the proposed approach is compared to an Mcep based LSTM-RNN voice conversion system. Both STRAIGHT vocoder and Mcep-based WaveNet vocoder are elected to produce the converted speech for Mcep conversion system. The fundamental frequency (F-0) of the converted speech in different systems is analyzed. The naturalness, similarity and intelligibility are evaluated in subjective measures. Results show that the spectrogram based WaveNet waveform generator can achieve better voice conversion quality compared to traditional WaveNet approaches. The Mel-spectrogram based voice conversion can achieve significant improvement in speaker similarity and inherent F-0 conversion.
引用
收藏
页码:1993 / 1997
页数:5
相关论文
共 50 条
  • [31] Emotional sounds of crowds: spectrogram-based analysis using deep learning
    Valentina Franzoni
    Giulio Biondi
    Alfredo Milani
    [J]. Multimedia Tools and Applications, 2020, 79 : 36063 - 36075
  • [32] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    [J]. The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 68 - 75
  • [33] High-quality voice conversion system based on GMM statistical parameters and RBF neural network
    CHEN Xian-tong
    ZHANG Ling-hua
    [J]. TheJournalofChinaUniversitiesofPostsandTelecommunications, 2014, 21 (05) : 68 - 75+93
  • [34] IMPROVING GAN-BASED VOCODER FOR FAST AND HIGH-QUALITY SPEECH SYNTHESIS
    He, Mengnan
    Guo, Tingwei
    Lu, Zhengxin
    Zhang, Ruixiong
    Gong, Caixia
    [J]. INTERSPEECH 2022, 2022, : 1601 - 1605
  • [35] A COMPACT FRAMEWORK FOR VOICE CONVERSION USING WAVENET CONDITIONED ON PHONETIC POSTERIORGRAMS
    Lu, Hui
    Wu, Zhiyong
    Li, Runnan
    Kang, Shiyin
    Jia, Jia
    Meng, Helen
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6810 - 6814
  • [36] SPECTROGRAM-BASED CLASSIFICATION OF SPOKEN FOUL LANGUAGE USING DEEP CNN
    Wazir, Abdulaziz Saleh Ba
    Karim, Hezerul Abdul
    Abdullah, Mohd Haris Lye
    Mansor, Sarina
    AlDahoul, Nouar
    Fauzi, Mohammad Faizal Ahmad
    See, John
    [J]. 2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [37] Continuous vocoder applied in deep neural network based voice conversion
    Mohammed Salah Al-Radhi
    Tamás Gábor Csapó
    Géza Németh
    [J]. Multimedia Tools and Applications, 2019, 78 : 33549 - 33572
  • [38] Continuous vocoder applied in deep neural network based voice conversion
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33549 - 33572
  • [39] A ANN BASED HIGH QUALITY METHOD FOR VOICE CONVERSION
    Chen, Z.
    Zhang, L. H.
    [J]. 2010 6TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS NETWORKING AND MOBILE COMPUTING (WICOM), 2010,
  • [40] Classification of nucleic acid amplification on ISFET arrays using spectrogram-based neural networks
    Tripathi, Prateek
    Gulli, Costanza
    Broomfield, Joseph
    Alexandrou, George
    Kalofonou, Melpomeni
    Bevan, Charlotte
    Moser, Nicolas
    Georgiou, Pantelis
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 161