Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum

被引:0
|
作者
Toda, T [1 ]
Saruwatari, H [1 ]
Shikano, K [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara 6300101, Japan
来源
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING | 2001年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the voice conversion algorithm based on the Gaussian Mixture Model (GMM) applied to STRAIGHT, quality of converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we propose the GMM-based algorithm with dynamic frequency warping to avoid the over-smoothing. We also propose an addition of the weighted residual spectrum, which is the difference between the GMM-based converted spectrum and the frequency-warped spectrum, to avoid the deterioration of conversion-accuracy on speaker individuality. Results of the evaluation experiments clarify that the converted speech quality is better than that of the GMM-based algorithm, and the conversion-accuracy on speaker individuality is the same as that of the GMM-based algorithm in the proposed method with the properly-weighted residual spectrum.
引用
收藏
页码:841 / 844
页数:4
相关论文
共 50 条
  • [1] Voice conversion using Viterbi algorithm based on Gaussian mixture model
    Jian Zhi-Hua
    Yang Zhen
    2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 40 - 43
  • [2] Voice conversion algorithm using phoneme Gaussian mixture model
    Sheng, L
    Yin, JX
    Huang, JC
    PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 5 - 8
  • [3] VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL
    Saito, Daisuke
    Doi, Hidenobu
    Minematsu, Nobuaki
    Hirose, Keikichi
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 567 - 571
  • [4] Voice Conversion Based on Weighted Frequency Warping
    Erro, Daniel
    Moreno, Asuncion
    Bonafonte, Antonio
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 922 - 931
  • [5] Speaker recognition based on dynamic time warping and Gaussian mixture model
    Zhang, Nannan
    Yao, Yanru
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1174 - 1177
  • [6] Correlation-based Frequency Warping for Voice Conversion
    Tian, Xiaohai
    Wu, Zhizheng
    Lee, S. W.
    Chng, Eng Siong
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 211 - +
  • [7] SPARSE REPRESENTATION FOR FREQUENCY WARPING BASED VOICE CONVERSION
    Tian, Xiaohai
    Wu, Zhizheng
    Lee, Siu Wa
    Nguyen Quy Hy
    Chng, Eng Siong
    Dong, Minghui
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4235 - 4239
  • [8] Voice Conversion Using Structrued Gaussian Mixture Model
    Zeng, Daojian
    Yu, Yibiao
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 541 - 544
  • [9] Efficient Gaussian Mixture Model Evaluation in Voice Conversion
    Tian, Jilei
    Nurminen, Jani
    Popa, Victor
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2282 - 2285
  • [10] Weighted Frequency Warping for Voice Conversion
    Erro, Daniel
    Moreno, Asuncion
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1465 - 1468