CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS

被引:0
|
作者
Xie, Feng-Long [1 ]
Wu, Yi-Jian [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
cross validation; minimum generation error; context clustering; HMM-based synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.
引用
收藏
页码:60 / 63
页数:4
相关论文
共 50 条
  • [21] HMM-based TTS for Hanoi Vietnamese: issues in design and evaluation
    Nguyen Thi Thu Trang
    D'Alessandro, Christophe
    Rilliard, Albert
    Tran Do Dat
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2310 - 2314
  • [22] Rich Context Modeling for High Quality HMM-Based TTS
    Yan, Zhi-Jie
    Qian, Yao
    Soong, Frank K.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1767 - 1770
  • [23] HMM-based audio keyword generation
    Xu, M
    Duan, LY
    Cai, J
    Chia, LT
    Xu, CS
    Tian, Q
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 3, PROCEEDINGS, 2004, 3333 : 566 - 574
  • [24] Reducing Computational and Memory Cost for HMM-based Embedded TTS System
    Fu, Rong
    Zhao, Zengliang
    Tu, Qixiong
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL I, 2010, : 351 - 354
  • [25] HMM-Based Trust Model
    Elsalamouny, Ehab
    Sassone, Vladimiro
    Nielsen, Mogens
    FORMAL ASPECTS IN SECURITY AND TRUST, 2010, 5983 : 21 - +
  • [26] AA SPECTRAL SPACE WARPING APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION IN HMM-BASED TTS
    Wang, Hao
    Soong, Frank
    Meng, Helen
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4874 - 4878
  • [27] An HMM-Based Reputation Model
    ElSalamouny, Ehab
    Sassone, Vladimiro
    ADVANCES IN SECURITY OF INFORMATION AND COMMUNICATION NETWORKS, 2013, 381 : 111 - +
  • [28] Reducing Computational and Memory Cost for HMM-Based Embedded TTS System
    Fu, Rong
    Zhao, Zengliang
    Tu, Qixiong
    APPLIED INFORMATICS AND COMMUNICATION, PT I, 2011, 224 : 602 - +
  • [29] Speech-rate-variable HMM-based Japanese TTS system
    Iwano, K
    Yamada, M
    Togawa, T
    Furui, S
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 219 - 222
  • [30] Minimum Kullback-Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1492 - 1502