Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features

被引:0
|
作者
Zheng, Huadi [3 ]
Cai, Weicheng [1 ]
Zhou, Tianyan [1 ]
Zhang, Shilei [4 ]
Li, Ming [1 ,2 ]
机构
[1] Sun Yat Sen Univ, SYSU CMU Joint Inst Engn, Guangzhou, Guangdong, Peoples R China
[2] SYSU CMU Shunde Int Joint Res Inst, Shunde, Guangdong, Peoples R China
[3] Hong Kong Polytech Univ, Dept EIE, Hong Kong, Hong Kong, Peoples R China
[4] IBM China Res, Speech Technol & Solut Grp, Beijing, Peoples R China
来源
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2016年
基金
中国国家自然科学基金;
关键词
Gaussian mixture model; phoneme posterior probability; voice conversion; deep neural network; MAXIMUM-LIKELIHOOD-ESTIMATION; REPRESENTATION; EXTRACTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a phonetically-aware joint density Gaussian mixture model (JD-GMM) framework for voice conversion that no longer requires parallel data from source speaker at the training stage. Considering that the phonetic level features contain text information which should be preserved in the conversion task, we propose a method that only concatenates phonetic discriminant features and spectral features extracted from the same target speakers speech to train a JD-GMM. After the mapping relationship of these two features is trained, we can use phonetic discriminant features from source speaker to estimate target speaker's spectral features at conversion stage. The phonetic discriminant features are extracted using PCA from the output layer of a deep neural network (DNN) in an automatic speaker recognition (ASR) system. It can be seen as a low dimensional representation of the senone posteriors. We compare the proposed phonetically-aware method with conventional JD-GMM method on the Voice Conversion Challenge 2016 training database. The experimental results show that our proposed phonetically-aware feature method can obtain similar performance compared to the conventional JD-GMM in the case of using only target speech as training data.
引用
收藏
页码:2872 / 2877
页数:6
相关论文
共 50 条
  • [1] Text-Independent Voice Conversion Based on Kernel Eigenvoice
    Li, Yanping
    Zhang, Linghua
    Ding, Hui
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2010, 6319 : 432 - +
  • [2] Text-independent voice conversion based on unit selection
    Suendermann, David
    Hoege, Harald
    Bonafonte, Antonio
    Ney, Hermann
    Black, Alan
    Narayanan, Shri
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 81 - 84
  • [3] Text-independent voice conversion based on state mapped codebook
    Zhang, Meng
    Tao, Jianhua
    Tian, Jilei
    Wang, Xia
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4605 - +
  • [4] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [5] Text-Independent Cross-Language Voice Conversion
    Suendermann, David
    Hoege, Harald
    Bonafonte, Antonio
    Ney, Hermann
    Hirschberg, Julia
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2262 - +
  • [6] PHONEME CLUSTER BASED STATE MAPPING FOR TEXT-INDEPENDENT VOICE CONVERSION
    Zhang, Meng
    Tao, Jiaohua
    Nurminen, Jani
    Tian, Jilei
    Wang, Xia
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4281 - +
  • [7] Supervisory Data Alignment for Text-Independent Voice Conversion
    Tao, Jianhua
    Zhang, Meng
    Nurminen, Jani
    Tian, Jilei
    Wang, Xia
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 932 - 943
  • [8] Text-independent writer identification using convolutional neural network
    Hung Tuan Nguyen
    Cuong Tuan Nguyen
    Ino, Takeya
    Indurkhya, Bipin
    Nakagawa, Masaki
    PATTERN RECOGNITION LETTERS, 2019, 121 : 104 - 112
  • [9] Modified layer deep convolution neural network for text-independent speaker recognition
    Karthikeyan, V
    Priyadharsini, Suja S.
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (02) : 273 - 285
  • [10] Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network
    Jahangir, Rashid
    TEh, Ying Wah
    Memon, Nisar Ahmed
    Mujtaba, Ghulam
    Zareei, Mahdi
    Ishtiaq, Uzair
    Akhtar, Muhammad Zaheer
    Ali, Ihsan
    IEEE ACCESS, 2020, 8 : 32187 - 32202