CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS

被引:0
|
作者
Xie, Feng-Long [1 ]
Wu, Yi-Jian [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
关键词
cross validation; minimum generation error; context clustering; HMM-based synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.
引用
收藏
页码:60 / 63
页数:4
相关论文
共 50 条
  • [41] Analysis of speaker clustering strategies for HMM-based speech synthesis
    Dall, Rasmus
    Veaux, Christophe
    Yamagishi, Junichi
    King, Simon
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 994 - 997
  • [42] A novel HMM-based TTS system using both continuous HMMS and discrete HMMS
    Yu, Jian
    Zhang, Meng
    Tao, Jianhua
    Wang, Xia
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 709 - +
  • [43] An improved training algorithm in HMM-based speech recognition
    Li, GJ
    Huong, TY
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1057 - 1060
  • [44] The Error Pattern Analysis of the HMM-Based Automatic Phoneme Segmentation
    Kim, Min-Je
    Lee, Jung-Chul
    Kim, Jong-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2006, 25 (05): : 213 - 221
  • [45] A HMM-Based Structured Model for Human Behavior
    Wang, Weihua
    PROCEEDINGS OF THE 2017 5TH INTERNATIONAL CONFERENCE ON FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY (FMSMT 2017), 2017, 130 : 1599 - 1602
  • [46] CONTINUOUS F0 IN THE SOURCE-EXCITATION GENERATION FOR HMM-BASED TTS: DO WE NEED VOICEDIUNVOICED CLASSIFICATION?
    Latorre, Javier
    Gales, Mark J. F.
    Buchholz, Sabine
    Knill, Kate
    Tamura, Masatsune
    Ohtani, Yamato
    Akamine, Masami
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4724 - 4727
  • [47] DEVELOPMENT OF THE SLOVAK HMM-BASED TTS SYSTEM AND EVALUATION OF VOICES IN RESPECT TO THE USED VOCODING TECHNIQUES
    Sulir, Martin
    Juhar, Jozef
    Rusko, Milan
    COMPUTING AND INFORMATICS, 2016, 35 (06) : 1467 - 1490
  • [48] USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS
    Hirose, Keikichi
    Hashimoto, Hiroya
    Hyakutake, Kyota
    Saito, Daisuke
    Minematsu, Nobuaki
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 555 - 560
  • [49] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
    Shiga, Yoshinori
    Toda, Tomoki
    Sakai, Shinsuke
    Kawai, Hisashi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812
  • [50] A new HMM-based ensemble generation method for numeral recognition
    Ko, Albert Hung-Ren
    Sabourin, Robert
    de Souza Britto, Alceu
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2007, 4472 : 52 - +