Rich Context Modeling for High Quality HMM-Based TTS

被引:0
|
作者
Yan, Zhi-Jie [1 ]
Qian, Yao [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
HMM-based TTS; rich context modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a rich context modeling approach to high quality HMM-based speech synthesis. We first analyze the over-smoothing problem in conventional decision tree tying-based HMM, and then propose to model the training speech tokens with rich context models. Special training procedure is adopted for reliable estimation of the rich context model parameters. In synthesis, a search algorithm following a context-based pre-selection is performed to determine the optimal rich context model sequence which generates natural and crisp output speech. Experimental results show that spectral envelopes synthesized by the rich context models are with crisper formant structures and evolve with richer details than those obtained by the conventional models. The speech quality improvement is also perceived by listeners in a subjective preference test, in which 76% of the sentences synthesized using rich context modeling are preferred.
引用
收藏
页码:1767 / 1770
页数:4
相关论文
共 50 条
  • [21] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    King, Simon
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [22] On the use of context-dependent modeling units for HMM-based offline handwriting recognition
    Fink, Gernot A.
    Ploetz, Thomas
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 729 - 733
  • [23] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Soheil Khorram
    Hossein Sameti
    Simon King
    EURASIP Journal on Advances in Signal Processing, 2015
  • [24] DIALOGUE CONTEXT SENSITIVE HMM-BASED SPEECH SYNTHESIS
    Tsiakoulis, Pirros
    Breslin, Catherine
    Gasic, Milica
    Henderson, Matthew
    Kim, Dongho
    Szummer, Martin
    Thomson, Blaise
    Young, Steve
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [25] CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS
    Xie, Feng-Long
    Wu, Yi-Jian
    Soong, Frank K.
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 60 - 63
  • [26] RICH-CONTEXT UNIT SELECTION (RUS) APPROACH TO HIGH QUALITY TTS
    Yan, Zhi-Jie
    Qian, Yao
    Soong, Frank K.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4798 - 4801
  • [27] A novel HMM-based TTS system using both continuous HMMS and discrete HMMS
    Yu, Jian
    Zhang, Meng
    Tao, Jianhua
    Wang, Xia
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 709 - +
  • [28] State duration modeling for HMM-based speech synthesis
    Zen, Heiga
    Masuko, Takashi
    Tokuda, Keiichi
    Yoshimura, Takayoshi
    Kobayasih, Takao
    Kitamura, Tadashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
  • [29] HMM-based gain modeling for enhancement of speech in noise
    Zhao, David Y.
    Kleijn, W. Bastiaan
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 882 - 892
  • [30] An HMM Trajectory Tiling (HTT) Approach to High Quality TTS
    Qian, Yao
    Yan, Zhi-jie
    Wu, Yijian
    Soong, Frank
    Zhuang, Xin
    Kong, Shengyi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 422 - +