Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training

被引：174

作者：

Chen, Ling-Hui ^{[1
]}

Ling, Zhen-Hua ^{[1
]}

Liu, Li-Juan ^{[1
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230027, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 12期

关键词：

Bidirectional associative memory; deep neural network; Gaussian mixture model; restricted Boltzmann machine; spectral envelope conversion; voice conversion; SPEECH; HMM;

D O I：

10.1109/TASLP.2014.2353991

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a new spectral envelope conversion method using deep neural networks (DNNs). The conventional joint density Gaussian mixture model (JDGMM) based spectral conversion methods perform stably and effectively. However, the speech generated by these methods suffer severe quality degradation due to the following two factors: 1) inadequacy of JDGMM in modeling the distribution of spectral features as well as the non-linear mapping relationship between the source and target speakers, 2) spectral detail loss caused by the use of high-level spectral features such as mel-cepstra. Previously, we have proposed to use the mixture of restricted Boltzmann machines (MoRBM) and the mixture of Gaussian bidirectional associative memories (MoGBAM) to cope with these problems. In this paper, we propose to use a DNN to construct a global non-linear mapping relationship between the spectral envelopes of two speakers. The proposed DNN is generatively trained by cascading two RBMs, which model the distributions of spectral envelopes of source and target speakers respectively, using a Bernoulli BAM (BBAM). Therefore, the proposed training method takes the advantage of the strong modeling ability of RBMs in modeling the distribution of spectral envelopes and the superiority of BAMs in deriving the conditional distributions for conversion. Careful comparisons and analysis among the proposed method and some conventional methods are presented in this paper. The subjective results show that the proposed method can significantly improve the performance in terms of both similarity and naturalness compared to conventional methods.

引用

页码：1859 / 1872

页数：14

共 50 条

[1] SPSA for Layer-Wise Training of Deep Networks
Wulff, Benjamin
Schuecker, Jannis
Bauckhage, Christian
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 564 - 573
[2] Multithreaded Layer-wise Training of Sparse Deep Neural Networks using Compressed Sparse Column
Mofrad, Mohammad Hasanzadeh
Melhem, Rami
Ahmad, Yousuf
Hammoud, Mohammad
[J]. 2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[3] Layer-Wise Compressive Training for Convolutional Neural Networks
Grimaldi, Matteo
Tenace, Valerio
Calimera, Andrea
[J]. FUTURE INTERNET, 2019, 11 (01)
[4] LAYER-WISE INTERPRETATION OF DEEP NEURAL NETWORKS USING IDENTITY INITIALIZATION
Kubota, Shohei
Hayashi, Hideaki
Hayase, Tomohiro
Uchida, Seiichi
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3945 - 3949
[5] Layer-Wise Weight Decay for Deep Neural Networks
Ishii, Masato
Sato, Atsushi
[J]. IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10749 : 276 - 289
[6] Stochastic Layer-Wise Precision in Deep Neural Networks
Lacey, Griffin
Taylor, Graham W.
Areibi, Shawki
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 663 - 672
[7] Investigating Learning in Deep Neural Networks Using Layer-Wise Weight Change
Agrawal, Ayush Manish
Tendle, Atharva
Sikka, Harshvardhan
Singh, Sahib
Kayid, Amr
[J]. INTELLIGENT COMPUTING, VOL 2, 2021, 284 : 678 - 693
[8] Layer-Wise Training to Create Efficient Convolutional Neural Networks
Zeng, Linghua
Tian, Xinmei
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 631 - 641
[9] Unsupervised Layer-Wise Model Selection in Deep Neural Networks
Ludovic, Arnold
Helene, Paugam-Moisy
Michele, Sebag
[J]. ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 915 - 920
[10] Collaborative Layer-Wise Discriminative Learning in Deep Neural Networks
Jin, Xiaojie
Chen, Yunpeng
Dong, Jian
Feng, Jiashi
Yan, Shuicheng
[J]. COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 733 - 749

← 1 2 3 4 5 →