IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD

被引:0
|
作者
Jiao, Yishan [1 ]
Xie, Xiang [1 ]
Na, Xingyu [1 ]
Tu, Ming [1 ]
机构
[1] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
关键词
HMM-based speech synthesis; voice conversion; local linear transformation; temporal decomposition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
HMM-based speech synthesis system (HTS) often generates buzzy and muffled speech. Such degradation of voice quality makes synthetic speech sound robotically rather than naturally. From this point, we suppose that synthetic speech is in a different speaker space apart from the original. We propose to use voice conversion method to transform synthetic speech toward the original so as to improve its quality. Local linear transformation (LLT) combined with temporal decomposition (TD) is proposed as the conversion method. It can not only ensure smooth spectral conversion but also avoid over-smoothing problem. Moreover, we design a robust spectral selection and modification strategy to make the modified spectra stable. Preference test shows that the proposed method can improve the quality of HMM-based speech synthesis.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Generation of creaky voice for improving the quality of HMM-based speech synthesis
    Narendra, N. P.
    Rao, K. Sreenivasa
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 42 : 38 - 58
  • [2] Voice characteristics conversion for HMM-based speech synthesis system
    Masuko, T
    Tokuda, K
    Kobayashi, T
    Imai, S
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1611 - 1614
  • [3] A training method of average voice model for HMM-based speech synthesis
    Yamagishi, J
    Tamura, M
    Masuko, T
    Tokuda, K
    Kobayashi, T
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2003, E86A (08) : 1956 - 1963
  • [4] Using HMM-based Speech Synthesis to Reconstruct the Voice of Individuals with Degenerative Speech Disorders
    Veaux, Christophe
    Yamagishi, Junichi
    King, Simon
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 966 - 969
  • [5] FACTOR ANALYZED VOICE MODELS FOR HMM-BASED SPEECH SYNTHESIS
    Kazumi, Kyosuke
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4234 - 4237
  • [6] Usage of the HMM-Based Speech Synthesis for intelligent Arabic voice
    Fares, Tamer S.
    Khalil, Awad H.
    Hegazy, Abd El-Fatah A.
    [J]. INTELLIGENT SYSTEMS AND AUTOMATION, 2008, 1019 : 93 - +
  • [7] Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion
    Gia-Nhu Nguyen
    Trung-Nghia Phung
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2017
  • [8] Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion
    Gia-Nhu Nguyen
    Trung-Nghia Phung
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [9] HMM-based synthesis of creaky voice
    Raitio, Tuomo
    Kane, John
    Drugman, Thomas
    Gobl, Christer
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2315 - +
  • [10] An HMM-based Singing Voice Synthesis System
    Saino, Keijiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2274 - 2277