Improved Generation of Fundemental Frequency in HMM-Based Speech Synthesis Using Generation Process Model

被引：0

作者：

Wang, Miaomiao ^{[1
]}

Wen, Miaomiao ^{[1
]}

Hirose, Keikichi

Minematsu, Nobuaki

机构：

[1] Univ Tokyo, Dept Elect Engn & Informat Syst, Tokyo, Japan

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年

关键词：

Mandarin speech synthesis; F-0; generation; generation process model; HMM-based TTS; INSTANTANEOUS-FREQUENCY;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (VU) decisions are the two key factors in voice quality problems. Pitch tracking errors occur more often in Mandarin vowels of Tone 3 and Tone 4. On the other hand, due to the dis-continuous F-0 values in voiced and unvoiced regions, it is then impossible to use standard HMMs for F-0 modeling. Currently a preferred method to solve this is to use a multi-space distribution HMM (MSD-HMM). In this approach, discrete distributions are used for modeling the VU decision and continuous Gaussian distributions are used for F-0 modeling within the voiced regions. Due to this assumption of undefined F-0 values in unvoiced regions and the special structure of MSDHMM, the generated F-0 values are limited in accuracy. In this paper, an F-0 generation process model is used to re-estimate F-0 values in the regions of pitch tracking errors, as well as in unvoiced regions. A prior knowledge of VU is imposed in each Mandarin phoneme and they are used for VU decision. Then the F-0 can be modeled within the standard HMM framework.

引用

页码：2166 / +

页数：2

共 50 条

[1] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
Matsuda, Tetsuya
Hirose, Keikichi
Minematsu, Nobuaki
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
[2] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
Hirose, Keikichi
Matsuda, Tatsuya
Hashimoto, Hiroya
Minematsu, Nobuaki
2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[3] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
Hirose, Keikichi
Hashimoto, Hiroya
Ikeshima, Jun
Minematsu, Nobuaki
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
[4] An improved minimum generation error based model adaptation for HMM-based speech synthesis
Wu, Yi-Jian
Qin, Long
Tokuda, Keiichi
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1727 - +
[5] Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model
Hirose, Keikichi
Hashimoto, Hiroya
Ikeshima, Jun
Minematsu, Nobuaki
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 171 - 174
[6] USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS
Hirose, Keikichi
Hashimoto, Hiroya
Hyakutake, Kyota
Saito, Daisuke
Minematsu, Nobuaki
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 555 - 560
[7] Speech parameter generation algorithms for HMM-based speech synthesis
Tokuda, K
Yoshimura, T
Masuko, T
Kobayashi, T
Kitamura, T
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
[8] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
Hashimoto, Hiroya
Hirose, Keikichi
Minematsu, Nobuaki
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
[9] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
Chunwijitra, Vataya
Nose, Takashi
Kobayashi, Takao
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
[10] A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis
Nose, Takashi
Chunwijitra, Vataya
Kobayashi, Takao
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 221 - 228

← 1 2 3 4 5 →