CROSS VALIDATION AND MINIMUM GENERATION ERROR FOR IMPROVED MODEL CLUSTERING IN HMM-BASED TTS

被引：0

作者：

Xie, Feng-Long ^{[1
]}

Wu, Yi-Jian ^{[1
]}

Soong, Frank K. ^{[1
]}

机构：

[1] Microsoft Res Asia, Beijing, Peoples R China

来源：

2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING | 2012年

关键词：

cross validation; minimum generation error; context clustering; HMM-based synthesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.

引用

页码：60 / 63

页数：4

共 50 条

[31] Minimum unit selection error training for HMM-based unit selection speech synthesis system
Ling, Zhen-Hua
Wang, Ren-Hua
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3949 - 3952
[32] HMM state clustering based on efficient cross-validation
Shinozaki, T.
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1157 - 1160
[33] Speaker Adaptation using Relevance Vector Regression for HMM-based Expressive TTS
Hong, Doo Hwa
Lee, Joun Yeop
Jang, Se Young
Kim, Nam Soo
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1216 - 1220
[34] Waveform Interpolation-Based Speech Analysis/Synthesis for HMM-Based TTS Systems
Jung, Chi-Sang
Joo, Young-Sun
Kang, Hong-Goo
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 809 - 812
[35] F0 parameterization of glottalized tones for HMM-based Vietnamese TTS
Ninh, Duy Khanh
Yamashita, Yoichi
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2202 - 2206
[36] HMM-based generation of laughter facial expression
Cakmak, Huseyin
Dutoit, Thierry
SPEECH COMMUNICATION, 2018, 98 : 28 - 41
[37] Improving Naturalness of HMM-Based TTS Trained with Limited Data by Temporal Decomposition
Trung-Nghia Phung
Thanh-Son Phan
Thang Tat Vu
Mai Chi Luong
Akagi, Masato
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (11): : 2417 - 2426
[38] Statistical model training technique based on speaker clustering approach for HMM-based speech synthesis
Ijima, Yusuke
Miyazaki, Noboru
Mizuno, Hideyuki
Sakauchi, Sumitaka
SPEECH COMMUNICATION, 2015, 71 : 50 - 61
[39] An improved maximum model distance approach for HMM-based speech recognition systems
He, QH
Kwong, S
Man, KF
Tang, KS
PATTERN RECOGNITION, 2000, 33 (10) : 1749 - 1758
[40] Bayesian Context Clustering Using Cross Valid Prior Distribution for HMM-Based Speech Recognition
Hashimoto, Kei
Zen, Heiga
Nankaku, Yoshihiko
Lee, Akinobu
Tokuda, Keiichi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 936 - 939

← 1 2 3 4 5 →