CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS

被引：0

作者：

Xie, Feng-Long ^{[1
]}

Wu, Yi-Jian ^{[1
]}

Soong, Frank K. ^{[1
]}

机构：

[1] Microsoft Res Asia, Beijing, Peoples R China

来源：

2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING | 2012年

关键词：

cross validation; minimum generation error; context clustering; HMM-based synthesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.

引用

页码：60 / 63

页数：4

共 50 条

[1] CROSS-VALIDATION BASED DECISION TREE CLUSTERING FOR HMM-BASED TTS
Zhang, Yu
Yan, Zhi-Jie
Soong, Frank K.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4602 - 4605
[2] An improved minimum generation error based model adaptation for HMM-based speech synthesis
Wu, Yi-Jian
Qin, Long
Tokuda, Keiichi
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1727 - +
[3] A Minimum V/U Error Approach to F0 Generation in HMM-based TTS
Qian, Yao
Soong, Frank
Wang, Miaomiao
Wu, Zhizheng
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 400 - 403
[4] Minimum generation error training for HMM-based speech synthesis
Wu, Yi-Jian
Wang, Ren-Hua
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 89 - 92
[5] Decision Tree Based Context Clustering with Cross Likelihood Ratio for HMM-based TTS
Jung, Chi-Sang
Kang, Hong-Goo
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2013, 32 (02): : 174 - 180
[6] Model Adaptation for HMM-Based Speech Synthesis under Minimum Generation Error Criterion
Qin, Long
Wu, Yi-Jian
Ling, Zhen-Hua
Wang, Ren-Hua
ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 539 - +
[7] Minimum generation error linear regression based model adaptation for HMM-based speech synthesis
Qin, Long
Wu, Yi-Jian
Ling, Zhen-Hua
Wang, Ren-Hua
Da, Li-Rong
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3953 - +
[8] Minimum generation error based optimization of HMM model clustering for speech synthesis
Lu, Heng
Ling, Zhen-Hua
Lei, Ming
Dai, Li-Rong
Wang, Ren-Hua
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (06): : 822 - 828
[9] Sinusoidal model parameterization for HMM-based TTS system
Shechtman, Slava
Sorin, Alex
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 805 - 808
[10] IMPROVED MODELING FOR F0 GENERATION AND V/U DECISION IN HMM-BASED TTS
Zhang, Qingqing
Soong, Frank
Qian, Yao
Yan, Zhijie
Pan, Jielin
Yan, Yonghong
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4606 - 4609

← 1 2 3 4 5 →