Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features

被引：0

作者：

Zheng, Huadi ^{[3
]}

Cai, Weicheng ^{[1
]}

Zhou, Tianyan ^{[1
]}

Zhang, Shilei ^{[4
]}

Li, Ming ^{[1
,2
]}

机构：

[1] Sun Yat Sen Univ, SYSU CMU Joint Inst Engn, Guangzhou, Guangdong, Peoples R China

[2] SYSU CMU Shunde Int Joint Res Inst, Shunde, Guangdong, Peoples R China

[3] Hong Kong Polytech Univ, Dept EIE, Hong Kong, Hong Kong, Peoples R China

[4] IBM China Res, Speech Technol & Solut Grp, Beijing, Peoples R China

来源：

2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2016年

基金：

中国国家自然科学基金;

关键词：

Gaussian mixture model; phoneme posterior probability; voice conversion; deep neural network; MAXIMUM-LIKELIHOOD-ESTIMATION; REPRESENTATION; EXTRACTION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a phonetically-aware joint density Gaussian mixture model (JD-GMM) framework for voice conversion that no longer requires parallel data from source speaker at the training stage. Considering that the phonetic level features contain text information which should be preserved in the conversion task, we propose a method that only concatenates phonetic discriminant features and spectral features extracted from the same target speakers speech to train a JD-GMM. After the mapping relationship of these two features is trained, we can use phonetic discriminant features from source speaker to estimate target speaker's spectral features at conversion stage. The phonetic discriminant features are extracted using PCA from the output layer of a deep neural network (DNN) in an automatic speaker recognition (ASR) system. It can be seen as a low dimensional representation of the senone posteriors. We compare the proposed phonetically-aware method with conventional JD-GMM method on the Voice Conversion Challenge 2016 training database. The experimental results show that our proposed phonetically-aware feature method can obtain similar performance compared to the conventional JD-GMM in the case of using only target speech as training data.

引用

页码：2872 / 2877

页数：6

共 50 条

[1] Text-Independent Voice Conversion Based on Kernel Eigenvoice
Li, Yanping
Zhang, Linghua
Ding, Hui
ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2010, 6319 : 432 - +
[2] Text-independent voice conversion based on unit selection
Suendermann, David
Hoege, Harald
Bonafonte, Antonio
Ney, Hermann
Black, Alan
Narayanan, Shri
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 81 - 84
[3] Text-independent voice conversion based on state mapped codebook
Zhang, Meng
Tao, Jianhua
Tian, Jilei
Wang, Xia
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4605 - +
[4] Deep Neural Network Embeddings for Text-Independent Speaker Verification
Snyder, David
Garcia-Romero, Daniel
Povey, Daniel
Khudanpur, Sanjeev
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
[5] Text-Independent Cross-Language Voice Conversion
Suendermann, David
Hoege, Harald
Bonafonte, Antonio
Ney, Hermann
Hirschberg, Julia
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2262 - +
[6] PHONEME CLUSTER BASED STATE MAPPING FOR TEXT-INDEPENDENT VOICE CONVERSION
Zhang, Meng
Tao, Jiaohua
Nurminen, Jani
Tian, Jilei
Wang, Xia
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4281 - +
[7] Supervisory Data Alignment for Text-Independent Voice Conversion
Tao, Jianhua
Zhang, Meng
Nurminen, Jani
Tian, Jilei
Wang, Xia
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 932 - 943
[8] Text-independent writer identification using convolutional neural network
Hung Tuan Nguyen
Cuong Tuan Nguyen
Ino, Takeya
Indurkhya, Bipin
Nakagawa, Masaki
PATTERN RECOGNITION LETTERS, 2019, 121 : 104 - 112
[9] Modified layer deep convolution neural network for text-independent speaker recognition
Karthikeyan, V
Priyadharsini, Suja S.
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (02) : 273 - 285
[10] Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network
Jahangir, Rashid
TEh, Ying Wah
Memon, Nisar Ahmed
Mujtaba, Ghulam
Zareei, Mahdi
Ishtiaq, Uzair
Akhtar, Muhammad Zaheer
Ali, Ihsan
IEEE ACCESS, 2020, 8 : 32187 - 32202

← 1 2 3 4 5 →