MODEL-MAPPING BASED VOICE CONVERSION SYSTEM A Novel Approach to Improve Voice Similarity and Naturalness using Model-based Speech Synthesis Techniques

被引:0
|
作者
Li, Baojie [1 ]
Wu, Dalei [1 ]
Jiang, Hui [1 ]
机构
[1] York Univ, Dept Comp Sci & Engn, 4700 Keele St, Toronto, ON M3J 1P3, Canada
关键词
Voice conversion; HMM-based speech synthesis; GMM; Model mapping;
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this paper we present a novel voice conversion application in which no any knowledge of source speakers is available, but only sufficient utterances from a target speaker and a number of other speakers are in hand. Our approach consists in two separate stages. At the training stage, we estimate a speaker dependent (SD) Gaussian mixture model (GMM) for the target speaker and additionally, we also estimate a speaker independent (SI) GMM by using the data from a number of speakers other than the source speaker. A mapping correlation between the SD and the SI model is maintained during the training process in terms of each phone label. At the conversion stage, we use the SI GMM to recognize each input frame and find the closest Gaussian mixture for it. Next, according to a mapping list, the counterpart Gaussian of the SD GMM is obtained and then used to generate a parameter vector for each frame vector. Finally all the generated vectors are concatenated to synthesize speech of the target speaker. By using the Proposed model-mapping approach, we can not only avoid the over-fitting problem by keeping the number of mixtures of the SI GMM to a fixed value, but also simultaneously improve voice quality in terms of similarity and naturalness by increasing the number of mixtures of the SD GMM. Experiments showed the effectiveness of this method.
引用
收藏
页码:442 / 446
页数:5
相关论文
共 50 条
  • [1] Online Model Adaptation for Voice Conversion using Model-based Speech Synthesis Techniques
    Wu, Dalei
    Li, Baojie
    Jiang, Hui
    Fu, Qian-Jie
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1611 - +
  • [2] Emotional speech synthesis based on improved codebook mapping voice conversion
    Wang, YP
    Ling, ZH
    Wang, RH
    [J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 374 - 381
  • [3] IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD
    Jiao, Yishan
    Xie, Xiang
    Na, Xingyu
    Tu, Ming
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] Voice characteristics conversion for HMM-based speech synthesis system
    Masuko, T
    Tokuda, K
    Kobayashi, T
    Imai, S
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1611 - 1614
  • [5] Speech recognition enhancement with statistical model-based voice activity detection
    Jarc, Bojan
    Babič, Rudolf
    [J]. Elektrotehniski Vestnik/Electrotechnical Review, 2002, 69 (01): : 75 - 81
  • [6] A Bayesian Approach to Voice Conversion Based on GMMs Using Multiple Model Structures
    Li, Lei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 668 - 671
  • [7] On the Training of DNN-based Average Voice Model for Speech Synthesis
    Yang, Shan
    Wu, Zhizheng
    Xie, Lei
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [8] A Novel Model-based Pitch Conversion Method for Mandarin Speech
    Hwang, Hsin-Te
    Chiang, Chen-Yu
    Sung, Po-Yi
    Chen, Sin-Horng
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2611 - 2614
  • [9] Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients
    Zheng, Wei-Zhong
    Han, Ji-Yan
    Lee, Chen-Kai
    Lin, Yu-Yi
    Chang, Shu-Han
    Lai, Ying-Hui
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 215
  • [10] Voice conversion using Viterbi algorithm based on Gaussian mixture model
    Jian Zhi-Hua
    Yang Zhen
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 40 - 43