Voice Conversion Method Combining Segmental GMM Mapping with Target Frame Selection

被引:0
|
作者
Gu, Hung-Yan [1 ]
Tsai, Sung-Feng [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
voice conversion; Gaussian mixture model; frame selection; discrete cepstrum coefficients; dynamic programming; MAXIMUM-LIKELIHOOD;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a voice conversion approach that combines two distinct ideas is proposed to improve the converted-voice quality. The first idea is to map spectral features, e.g. discrete cepstrum coefficients (DCC), with segmental Gaussian mixture models (GMMs). That is, a single GMM of a large number of mixture components is replaced here with several voice-content specific GMMs each consisting of much fewer mixture components. In addition, the second idea is to find a frame, of spectral features near to the mapped feature vector, from the target-speaker frame pool corresponding to the segment class as the input frame belongs to. Both ideas are intended to alleviate the problem encountered by a traditional GMM based conversion method, i.e. converted spectral envelopes are usually over smoothed. To apply the first idea to implement an on-line voice conversion system, we have proposed an automatic GMM selection algorithm based on dynamic programming (DP). Furthermore, as pointed out by previous researchers, mapping with a single selected Gaussian probability density function (PDF) instead of a combination of several Gaussian PDFs is helpful to obtain better converted-voice quality. Therefore, we have also proposed a Gaussian PDF selection algorithm and integrated it into our system. As to the implementation of the second idea, an algorithm based on DP is adopted which will consider both frame matching and connecting distances. For evaluating the performance of the two ideas studied here, three voice conversion systems are constructed, and used to conduct listening tests. The results of the tests show that the system with the two ideas combined can indeed obtain much improved voice quality besides improvement in timbre similarity.
引用
收藏
页码:609 / 626
页数:18
相关论文
共 21 条
  • [1] Improving Segmental GMM Based Voice Conversion Method with Target Frame Selection
    Gu, Hung-Yan
    Tsai, Sung-Fung
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 483 - 487
  • [2] Frame Correlation Based Autoregressive GMM Method for Voice Conversion
    Li, Xian
    Wang, Zeng-fu
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 221 - 225
  • [3] Improving the Performance of GMM Based Voice Conversion Method
    Song, Peng
    Zhao, Li
    [J]. PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 436 - 440
  • [4] A GMM based residual prediction method for voice conversion
    Xia, J
    Yin, JX
    [J]. ISPACS 2005: PROCEEDINGS OF THE 2005 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, 2005, : 389 - 392
  • [5] Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM
    Zahariev, Vadim
    Azarov, Elias
    Petrovsky, Alexander
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 788 - 798
  • [6] Towards a voice conversion system based on frame selection
    Dutoit, T.
    Holzapfel, A.
    Jottrand, M.
    Moinet, A.
    Perez, J.
    Stylianou, Y.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 513 - +
  • [7] A hybrid GMM and codebook mapping method for spectral conversion
    Kang, YG
    Shuang, ZW
    Tao, JH
    Zhang, W
    Xu, B
    [J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 303 - 310
  • [8] Voice conversion by combining frequency warping with unit selection
    Shuang, Zhiwei
    Meng, Fanping
    Qin, Yong
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4661 - 4664
  • [9] Frame Labeling and Mapping for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Ehnes, Jochen Walter
    Lu, Yanfeng
    Ming, Huaiping
    Huang, Dongyan
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 361 - 365
  • [10] Voice Conversion using K-Histograms and Frame Selection
    Jose Uriz, Alejandro
    Daniel Agueero, Pablo
    Bonafonte, Antonio
    Carlos Tulli, Juan
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1607 - +