CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS

被引:0
|
作者
Xie, Feng-Long [1 ]
Wu, Yi-Jian [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
cross validation; minimum generation error; context clustering; HMM-based synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.
引用
收藏
页码:60 / 63
页数:4
相关论文
共 50 条
  • [11] INCORPORATING DYNAMIC FEATURES INTO MINIMUM GENERATION ERROR TRAINING FOR HMM-BASED SPEECH SYNTHESIS
    Ninh, Duy Khanh
    Morise, Masanori
    Yamashita, Yoichi
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 55 - 59
  • [12] Multi-Centroidal Duration Generation Algorithm for HMM-Based TTS
    Kang, Yongguo
    Li, Jian
    Deng, Yan
    Wang, Miaomiao
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1539 - 1542
  • [13] MINIMUM GENERATION ERROR TRAINING WITH WEIGHTED EUCLIDEAN DISTANCE ON LSP FOR HMM-BASED SPEECH SYNTHESIS
    Lei, Ming
    Ling, Zhen-Hua
    Dai, Li-Rong
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4230 - 4233
  • [14] Minimum generation error criterion considering global/local variance for HMM-based speech synthesis
    Wu, Yi-Jian
    Zen, Heiga
    Nankaku, Yoshilliko
    Tokuda, Keiichi
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4621 - 4624
  • [15] A Perceptual Study of Acceleration Parameters in HMM-based TTS
    Chen, Yi-Ning
    Yan, Zhi-Jie
    Soong, Frank K.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 426 - +
  • [16] Measuring the gap between HMM-based ASR and TTS
    Dines, John
    Yamagishi, Junichi
    King, Simon
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1411 - +
  • [17] Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
    Wu, Yi-Jian
    Tokuda, Keiichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 577 - 580
  • [18] PRESERVE ORDERING PROPERTY OF GENERATED LSPS FOR MINIMUM GENERATION ERROR TRAINING IN HMM-BASED SPEECH SYNTHESIS
    Lei, Ming
    Ling, Zhen-Hua
    Dai, Li-Rong
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4712 - 4715
  • [19] Measuring the Gap Between HMM-Based ASR and TTS
    Dines, John
    Yamagishi, Junichi
    King, Simon
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2010, 4 (06) : 1046 - 1058
  • [20] Improved Generation of Fundemental Frequency in HMM-Based Speech Synthesis Using Generation Process Model
    Wang, Miaomiao
    Wen, Miaomiao
    Hirose, Keikichi
    Minematsu, Nobuaki
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2166 - +