A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis

被引：0

作者：

Maia, Ranniery

Toda, Tomoki

Tokuda, Keiichi

Sakai, Shinsuke

Nakamura, Satoshi

机构：

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

关键词：

speech synthesis; HMM-based speech synthesis; decision tree-based clustering; residual modeling;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a decision tree-based algorithm to cluster residual segments assuming an excitation model based on state-dependent filtering of pulse train and white noise. The decision tree construction principle is the same as the one applied to speech recognition. Here parent nodes are split using the residual maximum likelihood criterion. Once these excitation decision trees are constructed for residual signals segmented by full context models, using questions related to the full context of the training sentences, they can be utilized for excitation modeling in speech synthesis based on hidden Markov models (HMM). Experimental results have shown that the algorithm in question is very effective in terms of clustering residual signals given segmentation. pitch marks and full context questions, resulting in filters with good residual modeling properties.

引用

页码：1743 / 1746

页数：4

共 50 条

[1] Decision Tree-based Clustering with Outlier Detection for HMM-based Speech Synthesis
Oh, Kyung Hwan
Sung, June Sig
Hong, Doo Hwa
Kim, Nam Soo
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 108 - +
[2] On the state definition for a trainable excitation model in HMM-based speech synthesis
Maia, R.
Toda, T.
Tokuda, K.
Sakai, S.
Nakamura, S.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3965 - 3968
[3] Extended Decision Tree with OR Relationship for HMM-based Speech Synthesis
Wang, Yang
Tao, Jianhua
Yang, Minghao
Li, Ya
2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 225 - 229
[4] Excitation Modeling Based on Waveform Interpolation for HMM-based Speech Synthesis
Sung, June Sig
Hong, Doo Hwa
Oh, Kyung Hwan
Kim, Nam Soo
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 813 - 816
[5] State duration modeling for HMM-based speech synthesis
Zen, Heiga
Masuko, Takashi
Tokuda, Keiichi
Yoshimura, Takayoshi
Kobayasih, Takao
Kitamura, Tadashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
[6] Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis
Sung, June Sig
Hong, Doo Hwa
Koo, Hyun Woo
Kim, Nam Soo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (02): : 379 - 382
[7] Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis
Yamagishi, J
Tachibana, M
Masuko, T
Kobayashi, T
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 5 - 8
[8] Excitation Modeling for HMM-based Speech Synthesis Based on Principal Component Analysis
Narendra, N. P.
Reddy, M. Kiran
Rao, K. Sreenivasa
2016 TWENTY SECOND NATIONAL CONFERENCE ON COMMUNICATION (NCC), 2016,
[9] Excitation Modeling Method Based on Inverse Filtering for HMM-Based Speech Synthesis
Reddy, M. Kiran
Rao, K. Sreenivasa
MACHINE INTELLIGENCE AND SIGNAL ANALYSIS, 2019, 748 : 85 - 91
[10] A trainable excitation model for HMM-based speech synthesis
Maia, R.
Toda, T.
Zen, H.
Nankaku, Y.
Tokuda, K.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +

← 1 2 3 4 5 →