A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model

被引:0
|
作者
Suda, Hitoshi [1 ]
Kotani, Gaku [1 ]
Takamichi, Shinnosuke [2 ]
Saito, Daisuke [1 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper discusses influences of handling acoustic features on the quality of generated sounds in voice conversion (VC) systems based on Gaussian mixture models (GMMs). In the context of improving the quality of VC, mapping models, which are used to convert acoustic features, have been widely discussed. Nevertheless, the components other than the mapping models have rarely been studied. The experimental results show that the quality of VC depends on not only the models but also the methods of analysis and synthesis of utterances. This paper also investigates filtering methods for synthesis. In order to avoid buzzy sounds generated from vocoders, differential-spectrum compensation is applied as an alternative method of synthesizing waveforms. Although mel log spectral approximation (MLSA) filtering is traditionally used for differential-spectrum compensation, the experimental results indicate the approximation in MLSA filtering degrades the quality of the synthesized speech. In order to avoid this approximation, this paper introduces an alternative filtering method, which is named SP-WORLD, inspired by the WORLD vocoder framework. The subjective experiments demonstrate that SP-WORLD is comparable to MLSA filtering, and outperforms it in some cases.
引用
收藏
页码:816 / 822
页数:7
相关论文
共 50 条
  • [1] VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL
    Saito, Daisuke
    Doi, Hidenobu
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 567 - 571
  • [2] Voice conversion using Viterbi algorithm based on Gaussian mixture model
    Jian Zhi-Hua
    Yang Zhen
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 40 - 43
  • [3] Voice Conversion Using Structrued Gaussian Mixture Model
    Zeng, Daojian
    Yu, Yibiao
    [J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 541 - 544
  • [4] Efficient Gaussian Mixture Model Evaluation in Voice Conversion
    Tian, Jilei
    Nurminen, Jani
    Popa, Victor
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2282 - 2285
  • [5] Voice conversion using canonical correlation analysis based on Gaussian mixture model
    Jian, ZhiHua
    Yang, Zhen
    [J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 210 - +
  • [6] Voice conversion algorithm using phoneme Gaussian mixture model
    Sheng, L
    Yin, JX
    Huang, JC
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 5 - 8
  • [7] Contribution on Gaussian Mixture Model Order Determination for Voice Conversion
    Ben Amara, Ahmed
    Ben Jebara, Sofia
    [J]. 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL, IMAGE, VIDEO AND COMMUNICATIONS (ISIVC 2018), 2018, : 87 - 92
  • [8] A Voice Conversion System Based on the Harmonic plus Noise Excitation and Gaussian Mixture Model
    Wu Lifang
    Zhang Linghua
    [J]. PROCEEDINGS OF THE 2012 SECOND INTERNATIONAL CONFERENCE ON INSTRUMENTATION & MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC 2012), 2012, : 1575 - 1578
  • [9] Voice conversion using structured Gaussian mixture model in cepstrum eigenspace
    LI Yangchun
    YU Yibiao
    [J]. Chinese Journal of Acoustics, 2015, 34 (03) : 325 - 336
  • [10] High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
    Chen, Kuan
    Chen, Bo
    Lai, Jiahao
    Yu, Kai
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1993 - 1997