Rich Context Modeling for High Quality HMM-Based TTS

被引:0
|
作者
Yan, Zhi-Jie [1 ]
Qian, Yao [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
HMM-based TTS; rich context modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a rich context modeling approach to high quality HMM-based speech synthesis. We first analyze the over-smoothing problem in conventional decision tree tying-based HMM, and then propose to model the training speech tokens with rich context models. Special training procedure is adopted for reliable estimation of the rich context model parameters. In synthesis, a search algorithm following a context-based pre-selection is performed to determine the optimal rich context model sequence which generates natural and crisp output speech. Experimental results show that spectral envelopes synthesized by the rich context models are with crisper formant structures and evolve with richer details than those obtained by the conventional models. The speech quality improvement is also perceived by listeners in a subjective preference test, in which 76% of the sentences synthesized using rich context modeling are preferred.
引用
收藏
页码:1767 / 1770
页数:4
相关论文
共 50 条
  • [31] HMM-based speech enhancement using harmonic modeling
    Deisher, ME
    Spanias, AS
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1175 - 1178
  • [32] DEVELOPMENT OF THE SLOVAK HMM-BASED TTS SYSTEM AND EVALUATION OF VOICES IN RESPECT TO THE USED VOCODING TECHNIQUES
    Sulir, Martin
    Juhar, Jozef
    Rusko, Milan
    COMPUTING AND INFORMATICS, 2016, 35 (06) : 1467 - 1490
  • [33] A Minimum V/U Error Approach to F0 Generation in HMM-based TTS
    Qian, Yao
    Soong, Frank
    Wang, Miaomiao
    Wu, Zhizheng
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 400 - 403
  • [34] Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Koo, Hyun Woo
    Kim, Nam Soo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (02): : 379 - 382
  • [35] HMM-based speech enhancement using explicit gain modeling
    Zhao, David Y.
    Kleijn, W. Bastiaan
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 161 - 164
  • [36] Excitation Modeling Based on Waveform Interpolation for HMM-based Speech Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Oh, Kyung Hwan
    Kim, Nam Soo
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 813 - 816
  • [37] On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis
    Koriyama, Tomoki
    Nose, Takashi
    Kobayashi, Takao
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2668 - 2671
  • [38] HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1860 - +
  • [39] Combining multiple high quality corpora for improving HMM-TTS
    Wan, Vincent
    Latorre, Javier
    Chin, K. K.
    Chen, Langzhou
    Gales, Mark J. F.
    Zen, Heiga
    Knill, Kate
    Akamine, Masami
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1134 - 1137
  • [40] QUALITY CONTROL OF AUTOMATIC LABELLING USING HMM-BASED SYNTHESIS
    Pammi, Sathish
    Charfuelan, Marcela
    Schroeder, Marc
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4277 - +