CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS

被引:0
|
作者
Xie, Feng-Long [1 ]
Wu, Yi-Jian [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
cross validation; minimum generation error; context clustering; HMM-based synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.
引用
收藏
页码:60 / 63
页数:4
相关论文
共 50 条
  • [31] Minimum unit selection error training for HMM-based unit selection speech synthesis system
    Ling, Zhen-Hua
    Wang, Ren-Hua
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3949 - 3952
  • [32] HMM state clustering based on efficient cross-validation
    Shinozaki, T.
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1157 - 1160
  • [33] Speaker Adaptation using Relevance Vector Regression for HMM-based Expressive TTS
    Hong, Doo Hwa
    Lee, Joun Yeop
    Jang, Se Young
    Kim, Nam Soo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1216 - 1220
  • [34] Waveform Interpolation-Based Speech Analysis/Synthesis for HMM-Based TTS Systems
    Jung, Chi-Sang
    Joo, Young-Sun
    Kang, Hong-Goo
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 809 - 812
  • [35] F0 parameterization of glottalized tones for HMM-based Vietnamese TTS
    Ninh, Duy Khanh
    Yamashita, Yoichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2202 - 2206
  • [36] HMM-based generation of laughter facial expression
    Cakmak, Huseyin
    Dutoit, Thierry
    SPEECH COMMUNICATION, 2018, 98 : 28 - 41
  • [37] Improving Naturalness of HMM-Based TTS Trained with Limited Data by Temporal Decomposition
    Trung-Nghia Phung
    Thanh-Son Phan
    Thang Tat Vu
    Mai Chi Luong
    Akagi, Masato
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (11): : 2417 - 2426
  • [38] Statistical model training technique based on speaker clustering approach for HMM-based speech synthesis
    Ijima, Yusuke
    Miyazaki, Noboru
    Mizuno, Hideyuki
    Sakauchi, Sumitaka
    SPEECH COMMUNICATION, 2015, 71 : 50 - 61
  • [39] An improved maximum model distance approach for HMM-based speech recognition systems
    He, QH
    Kwong, S
    Man, KF
    Tang, KS
    PATTERN RECOGNITION, 2000, 33 (10) : 1749 - 1758
  • [40] Bayesian Context Clustering Using Cross Valid Prior Distribution for HMM-Based Speech Recognition
    Hashimoto, Kei
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 936 - 939