Improved Generation of Fundemental Frequency in HMM-Based Speech Synthesis Using Generation Process Model

被引:0
|
作者
Wang, Miaomiao [1 ]
Wen, Miaomiao [1 ]
Hirose, Keikichi
Minematsu, Nobuaki
机构
[1] Univ Tokyo, Dept Elect Engn & Informat Syst, Tokyo, Japan
关键词
Mandarin speech synthesis; F-0; generation; generation process model; HMM-based TTS; INSTANTANEOUS-FREQUENCY;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (VU) decisions are the two key factors in voice quality problems. Pitch tracking errors occur more often in Mandarin vowels of Tone 3 and Tone 4. On the other hand, due to the dis-continuous F-0 values in voiced and unvoiced regions, it is then impossible to use standard HMMs for F-0 modeling. Currently a preferred method to solve this is to use a multi-space distribution HMM (MSD-HMM). In this approach, discrete distributions are used for modeling the VU decision and continuous Gaussian distributions are used for F-0 modeling within the voiced regions. Due to this assumption of undefined F-0 values in unvoiced regions and the special structure of MSDHMM, the generated F-0 values are limited in accuracy. In this paper, an F-0 generation process model is used to re-estimate F-0 values in the regions of pitch tracking errors, as well as in unvoiced regions. A prior knowledge of VU is imposed in each Mandarin phoneme and they are used for VU decision. Then the F-0 can be modeled within the standard HMM framework.
引用
收藏
页码:2166 / +
页数:2
相关论文
共 50 条
  • [1] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
    Matsuda, Tetsuya
    Hirose, Keikichi
    Minematsu, Nobuaki
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
  • [2] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
    Hirose, Keikichi
    Matsuda, Tatsuya
    Hashimoto, Hiroya
    Minematsu, Nobuaki
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [3] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
    Hirose, Keikichi
    Hashimoto, Hiroya
    Ikeshima, Jun
    Minematsu, Nobuaki
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
  • [4] An improved minimum generation error based model adaptation for HMM-based speech synthesis
    Wu, Yi-Jian
    Qin, Long
    Tokuda, Keiichi
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1727 - +
  • [5] Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model
    Hirose, Keikichi
    Hashimoto, Hiroya
    Ikeshima, Jun
    Minematsu, Nobuaki
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 171 - 174
  • [6] USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS
    Hirose, Keikichi
    Hashimoto, Hiroya
    Hyakutake, Kyota
    Saito, Daisuke
    Minematsu, Nobuaki
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 555 - 560
  • [7] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
  • [8] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
    Hashimoto, Hiroya
    Hirose, Keikichi
    Minematsu, Nobuaki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
  • [9] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
  • [10] A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis
    Nose, Takashi
    Chunwijitra, Vataya
    Kobayashi, Takao
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 221 - 228