USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS

被引：0

作者：

Hirose, Keikichi ^{[1
]}

Hashimoto, Hiroya ^{[2
]}

Hyakutake, Kyota ^{[2
]}

Saito, Daisuke ^{[1
]}

Minematsu, Nobuaki ^{[2
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan

[2] Univ Tokyo, Dept Elect Engn & Informat Syst, Grad Sch Engn, Tokyo, Japan

来源：

2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2014年

关键词：

Generation process model; HMM-based speech synthesis; F-0; residual; Flexible F-0 control;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Generation process model of fundamental frequency (F-0) contours is known to represent global movements of F-0's keeping a clear relation with linguistic information of utterances. While HMMbased speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F-0 contours approximated by the model for HMM training instead of observed F-0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F-0 contours not represented by the model (F-0 residuals) are ignored. A scheme is further introduced to cope with the issue; F-0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F-0 residuals are added to the model-based Fo's before the waveform generation. The model constraint has another merit; relations between generated F-0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating Fo's in the F-0 model framework. Several experimental results supporting the advantages of the method are shown.

引用

页码：555 / 560

页数：6

共 50 条

[1] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
Hirose, Keikichi
Hashimoto, Hiroya
Ikeshima, Jun
Minematsu, Nobuaki
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
[2] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
Matsuda, Tetsuya
Hirose, Keikichi
Minematsu, Nobuaki
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
[3] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
Hirose, Keikichi
Matsuda, Tatsuya
Hashimoto, Hiroya
Minematsu, Nobuaki
2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[4] Improved Generation of Fundemental Frequency in HMM-Based Speech Synthesis Using Generation Process Model
Wang, Miaomiao
Wen, Miaomiao
Hirose, Keikichi
Minematsu, Nobuaki
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2166 - +
[5] Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model
Hirose, Keikichi
Hashimoto, Hiroya
Ikeshima, Jun
Minematsu, Nobuaki
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 171 - 174
[6] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
Hashimoto, Hiroya
Hirose, Keikichi
Minematsu, Nobuaki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
[7] Speech parameter generation algorithms for HMM-based speech synthesis
Tokuda, K
Yoshimura, T
Masuko, T
Kobayashi, T
Kitamura, T
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
[8] An improved minimum generation error based model adaptation for HMM-based speech synthesis
Wu, Yi-Jian
Qin, Long
Tokuda, Keiichi
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1727 - +
[9] A trainable excitation model for HMM-based speech synthesis
Maia, R.
Toda, T.
Zen, H.
Nankaku, Y.
Tokuda, K.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
[10] Minimum generation error training for HMM-based speech synthesis
Wu, Yi-Jian
Wang, Ren-Hua
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 89 - 92

← 1 2 3 4 5 →