USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS

被引:0
|
作者
Hirose, Keikichi [1 ]
Hashimoto, Hiroya [2 ]
Hyakutake, Kyota [2 ]
Saito, Daisuke [1 ]
Minematsu, Nobuaki [2 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan
[2] Univ Tokyo, Dept Elect Engn & Informat Syst, Grad Sch Engn, Tokyo, Japan
关键词
Generation process model; HMM-based speech synthesis; F-0; residual; Flexible F-0 control;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Generation process model of fundamental frequency (F-0) contours is known to represent global movements of F-0's keeping a clear relation with linguistic information of utterances. While HMMbased speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F-0 contours approximated by the model for HMM training instead of observed F-0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F-0 contours not represented by the model (F-0 residuals) are ignored. A scheme is further introduced to cope with the issue; F-0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F-0 residuals are added to the model-based Fo's before the waveform generation. The model constraint has another merit; relations between generated F-0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating Fo's in the F-0 model framework. Several experimental results supporting the advantages of the method are shown.
引用
收藏
页码:555 / 560
页数:6
相关论文
共 50 条
  • [1] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
    Hirose, Keikichi
    Hashimoto, Hiroya
    Ikeshima, Jun
    Minematsu, Nobuaki
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
  • [2] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
    Matsuda, Tetsuya
    Hirose, Keikichi
    Minematsu, Nobuaki
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
  • [3] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
    Hirose, Keikichi
    Matsuda, Tatsuya
    Hashimoto, Hiroya
    Minematsu, Nobuaki
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [4] Improved Generation of Fundemental Frequency in HMM-Based Speech Synthesis Using Generation Process Model
    Wang, Miaomiao
    Wen, Miaomiao
    Hirose, Keikichi
    Minematsu, Nobuaki
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2166 - +
  • [5] Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model
    Hirose, Keikichi
    Hashimoto, Hiroya
    Ikeshima, Jun
    Minematsu, Nobuaki
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 171 - 174
  • [6] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
    Hashimoto, Hiroya
    Hirose, Keikichi
    Minematsu, Nobuaki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
  • [7] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
  • [8] An improved minimum generation error based model adaptation for HMM-based speech synthesis
    Wu, Yi-Jian
    Qin, Long
    Tokuda, Keiichi
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1727 - +
  • [9] A trainable excitation model for HMM-based speech synthesis
    Maia, R.
    Toda, T.
    Zen, H.
    Nankaku, Y.
    Tokuda, K.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
  • [10] Minimum generation error training for HMM-based speech synthesis
    Wu, Yi-Jian
    Wang, Ren-Hua
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 89 - 92