Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training

被引:174
|
作者
Chen, Ling-Hui [1 ]
Ling, Zhen-Hua [1 ]
Liu, Li-Juan [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230027, Peoples R China
关键词
Bidirectional associative memory; deep neural network; Gaussian mixture model; restricted Boltzmann machine; spectral envelope conversion; voice conversion; SPEECH; HMM;
D O I
10.1109/TASLP.2014.2353991
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new spectral envelope conversion method using deep neural networks (DNNs). The conventional joint density Gaussian mixture model (JDGMM) based spectral conversion methods perform stably and effectively. However, the speech generated by these methods suffer severe quality degradation due to the following two factors: 1) inadequacy of JDGMM in modeling the distribution of spectral features as well as the non-linear mapping relationship between the source and target speakers, 2) spectral detail loss caused by the use of high-level spectral features such as mel-cepstra. Previously, we have proposed to use the mixture of restricted Boltzmann machines (MoRBM) and the mixture of Gaussian bidirectional associative memories (MoGBAM) to cope with these problems. In this paper, we propose to use a DNN to construct a global non-linear mapping relationship between the spectral envelopes of two speakers. The proposed DNN is generatively trained by cascading two RBMs, which model the distributions of spectral envelopes of source and target speakers respectively, using a Bernoulli BAM (BBAM). Therefore, the proposed training method takes the advantage of the strong modeling ability of RBMs in modeling the distribution of spectral envelopes and the superiority of BAMs in deriving the conditional distributions for conversion. Careful comparisons and analysis among the proposed method and some conventional methods are presented in this paper. The subjective results show that the proposed method can significantly improve the performance in terms of both similarity and naturalness compared to conventional methods.
引用
收藏
页码:1859 / 1872
页数:14
相关论文
共 50 条
  • [1] SPSA for Layer-Wise Training of Deep Networks
    Wulff, Benjamin
    Schuecker, Jannis
    Bauckhage, Christian
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 564 - 573
  • [2] Multithreaded Layer-wise Training of Sparse Deep Neural Networks using Compressed Sparse Column
    Mofrad, Mohammad Hasanzadeh
    Melhem, Rami
    Ahmad, Yousuf
    Hammoud, Mohammad
    [J]. 2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [3] Layer-Wise Compressive Training for Convolutional Neural Networks
    Grimaldi, Matteo
    Tenace, Valerio
    Calimera, Andrea
    [J]. FUTURE INTERNET, 2019, 11 (01)
  • [4] LAYER-WISE INTERPRETATION OF DEEP NEURAL NETWORKS USING IDENTITY INITIALIZATION
    Kubota, Shohei
    Hayashi, Hideaki
    Hayase, Tomohiro
    Uchida, Seiichi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3945 - 3949
  • [5] Layer-Wise Weight Decay for Deep Neural Networks
    Ishii, Masato
    Sato, Atsushi
    [J]. IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10749 : 276 - 289
  • [6] Stochastic Layer-Wise Precision in Deep Neural Networks
    Lacey, Griffin
    Taylor, Graham W.
    Areibi, Shawki
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 663 - 672
  • [7] Investigating Learning in Deep Neural Networks Using Layer-Wise Weight Change
    Agrawal, Ayush Manish
    Tendle, Atharva
    Sikka, Harshvardhan
    Singh, Sahib
    Kayid, Amr
    [J]. INTELLIGENT COMPUTING, VOL 2, 2021, 284 : 678 - 693
  • [8] Layer-Wise Training to Create Efficient Convolutional Neural Networks
    Zeng, Linghua
    Tian, Xinmei
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 631 - 641
  • [9] Unsupervised Layer-Wise Model Selection in Deep Neural Networks
    Ludovic, Arnold
    Helene, Paugam-Moisy
    Michele, Sebag
    [J]. ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 915 - 920
  • [10] Collaborative Layer-Wise Discriminative Learning in Deep Neural Networks
    Jin, Xiaojie
    Chen, Yunpeng
    Dong, Jian
    Feng, Jiashi
    Yan, Shuicheng
    [J]. COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 733 - 749