A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis

被引：0

作者：

Maia, Ranniery

Toda, Tomoki

Tokuda, Keiichi

Sakai, Shinsuke

Nakamura, Satoshi

机构：

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

关键词：

speech synthesis; HMM-based speech synthesis; decision tree-based clustering; residual modeling;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a decision tree-based algorithm to cluster residual segments assuming an excitation model based on state-dependent filtering of pulse train and white noise. The decision tree construction principle is the same as the one applied to speech recognition. Here parent nodes are split using the residual maximum likelihood criterion. Once these excitation decision trees are constructed for residual signals segmented by full context models, using questions related to the full context of the training sentences, they can be utilized for excitation modeling in speech synthesis based on hidden Markov models (HMM). Experimental results have shown that the algorithm in question is very effective in terms of clustering residual signals given segmentation. pitch marks and full context questions, resulting in filters with good residual modeling properties.

引用

页码：1743 / 1746

页数：4

共 50 条

[11] A BAYESIAN APPROACH TO HMM-BASED SPEECH SYNTHESIS
Hashimoto, Kei
Zen, Heiga
Nankaku, Yoshihiko
Masuko, Takashi
Tokuda, Keiichi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4029 - +
[12] FULL COVARIANCE STATE DURATION MODELING FOR HMM-BASED SPEECH SYNTHESIS
Lu, Heng
Wu, Yi-Jian
Tokuda, Keiichi
Dai, Li-Rong
Wang, Ren-Hua
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4033 - +
[13] CROSS-VALIDATION BASED DECISION TREE CLUSTERING FOR HMM-BASED TTS
Zhang, Yu
Yan, Zhi-Jie
Soong, Frank K.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4602 - 4605
[14] Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis
Wen, Zhengqi
Tao, Jianhua
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1426 - 1429
[15] Two-band excitation for HMM-based speech synthesis
Kim, Sang-Jin
Hahn, Minsoo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 378 - 381
[16] Statistical model training technique based on speaker clustering approach for HMM-based speech synthesis
Ijima, Yusuke
Miyazaki, Noboru
Mizuno, Hideyuki
Sakauchi, Sumitaka
SPEECH COMMUNICATION, 2015, 71 : 50 - 61
[17] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
Shiga, Yoshinori
Toda, Tomoki
Sakai, Shinsuke
Kawai, Hisashi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812
[18] Analysis of speaker clustering strategies for HMM-based speech synthesis
Dall, Rasmus
Veaux, Christophe
Yamagishi, Junichi
King, Simon
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 994 - 997
[19] Transform Mapping Using Shared Decision Tree Context Clustering for HMM-Based Cross-Lingual Speech Synthesis
Nagahama, Daiki
Nose, Takashi
Koriyama, Tomoki
Kobayashi, Takao
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 770 - 774
[20] Soft context clustering for F0 modeling in HMM-based speech synthesis
Khorram, Soheil
Sameti, Hossein
King, Simon
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,

← 1 2 3 4 5 →