Rich Context Modeling for High Quality HMM-Based TTS

被引:0
|
作者
Yan, Zhi-Jie [1 ]
Qian, Yao [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
HMM-based TTS; rich context modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a rich context modeling approach to high quality HMM-based speech synthesis. We first analyze the over-smoothing problem in conventional decision tree tying-based HMM, and then propose to model the training speech tokens with rich context models. Special training procedure is adopted for reliable estimation of the rich context model parameters. In synthesis, a search algorithm following a context-based pre-selection is performed to determine the optimal rich context model sequence which generates natural and crisp output speech. Experimental results show that spectral envelopes synthesized by the rich context models are with crisper formant structures and evolve with richer details than those obtained by the conventional models. The speech quality improvement is also perceived by listeners in a subjective preference test, in which 76% of the sentences synthesized using rich context modeling are preferred.
引用
收藏
页码:1767 / 1770
页数:4
相关论文
共 50 条
  • [1] Decision Tree Based Context Clustering with Cross Likelihood Ratio for HMM-based TTS
    Jung, Chi-Sang
    Kang, Hong-Goo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2013, 32 (02): : 174 - 180
  • [2] Sinusoidal model parameterization for HMM-based TTS system
    Shechtman, Slava
    Sorin, Alex
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 805 - 808
  • [3] A Perceptual Study of Acceleration Parameters in HMM-based TTS
    Chen, Yi-Ning
    Yan, Zhi-Jie
    Soong, Frank K.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 426 - +
  • [4] Measuring the gap between HMM-based ASR and TTS
    Dines, John
    Yamagishi, Junichi
    King, Simon
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1411 - +
  • [5] Measuring the Gap Between HMM-Based ASR and TTS
    Dines, John
    Yamagishi, Junichi
    King, Simon
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 1046 - 1058
  • [6] HMM-based Prosodic Structure Model Using Rich Linguistic Context
    Obin, Nicolas
    Rodet, Xavier
    Lacheret, Anne
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1133 - +
  • [7] HMM-based TTS for Hanoi Vietnamese: issues in design and evaluation
    Nguyen Thi Thu Trang
    D'Alessandro, Christophe
    Rilliard, Albert
    Tran Do Dat
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2310 - 2314
  • [8] High Quality Emotional HMM-Based Synthesis in Spanish
    Gonzalvo, Xavi
    Taylor, Paul
    Monzo, Carlos
    Iriondo, Ignasi
    Socoro, Joan Claudi
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2010, 5933 : 26 - +
  • [9] IMPROVED MODELING FOR F0 GENERATION AND V/U DECISION IN HMM-BASED TTS
    Zhang, Qingqing
    Soong, Frank
    Qian, Yao
    Yan, Zhijie
    Pan, Jielin
    Yan, Yonghong
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4606 - 4609
  • [10] Improvements to HMM-Based Speech Synthesis Based on Parameter Generation with Rich Context Models
    Takamichi, Shinnosuke
    Toda, Tomoki
    Shiga, Yoshinori
    Sakti, Sakriani
    Neubig, Graham
    Nakamura, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 364 - 368