Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum

被引:0
|
作者
Toda, T [1 ]
Saruwatari, H [1 ]
Shikano, K [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara 6300101, Japan
来源
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING | 2001年
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the voice conversion algorithm based on the Gaussian Mixture Model (GMM) applied to STRAIGHT, quality of converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we propose the GMM-based algorithm with dynamic frequency warping to avoid the over-smoothing. We also propose an addition of the weighted residual spectrum, which is the difference between the GMM-based converted spectrum and the frequency-warped spectrum, to avoid the deterioration of conversion-accuracy on speaker individuality. Results of the evaluation experiments clarify that the converted speech quality is better than that of the GMM-based algorithm, and the conversion-accuracy on speaker individuality is the same as that of the GMM-based algorithm in the proposed method with the properly-weighted residual spectrum.
引用
收藏
页码:841 / 844
页数:4
相关论文
共 50 条
  • [21] Voice conversion using structured Gaussian mixture model in cepstrum eigenspace
    LI Yangchun
    YU Yibiao
    Chinese Journal of Acoustics, 2015, 34 (03) : 325 - 336
  • [22] Voice Conversion Using Gaussian Mixture Models
    D'souza, Kevin
    Talele, K. T. V.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATION, INFORMATION & COMPUTING TECHNOLOGY (ICCICT), 2015,
  • [23] Eigenvoice Conversion Based on Gaussian Mixture Model
    Toda, Tomoki
    Ohtani, Yamato
    Shikano, Kiyohiro
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2446 - 2449
  • [24] Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora
    Godoy, Elizabeth
    Rosec, Olivier
    Chonavel, Thierry
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1313 - 1323
  • [25] Phoneme-based spectral voice conversion using temporal decomposition and Gaussian mixture model
    Nguyen, Binh Phu
    Akagi, Masato
    2008 SECOND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, 2008, : 222 - 227
  • [26] Voice conversion based on matrix variate Gaussian mixture model using multiple frame features
    Yang, Yi
    Uchida, Hidetsugu
    Saito, Daisuke
    Minematsu, Nobuaki
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 302 - 306
  • [27] A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model
    Suda, Hitoshi
    Kotani, Gaku
    Takamichi, Shinnosuke
    Saito, Daisuke
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 816 - 822
  • [28] Voice conversion by combining frequency warping with unit selection
    Shuang, Zhiwei
    Meng, Fanping
    Qin, Yong
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4661 - 4664
  • [29] A DYNAMIC GAUSSIAN PROCESS FOR VOICE CONVERSION
    Huang, Dong-Yan
    Dong, Minghui
    Li, Haizhou
    ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [30] Parametric Voice Conversion Based on Bilinear Frequency Warping Plus Amplitude Scaling
    Erro, Daniel
    Navas, Eva
    Hernaez, Inma
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 556 - 566