USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS

被引:0
|
作者
Hirose, Keikichi [1 ]
Hashimoto, Hiroya [2 ]
Hyakutake, Kyota [2 ]
Saito, Daisuke [1 ]
Minematsu, Nobuaki [2 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan
[2] Univ Tokyo, Dept Elect Engn & Informat Syst, Grad Sch Engn, Tokyo, Japan
来源
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2014年
关键词
Generation process model; HMM-based speech synthesis; F-0; residual; Flexible F-0 control;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Generation process model of fundamental frequency (F-0) contours is known to represent global movements of F-0's keeping a clear relation with linguistic information of utterances. While HMMbased speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F-0 contours approximated by the model for HMM training instead of observed F-0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F-0 contours not represented by the model (F-0 residuals) are ignored. A scheme is further introduced to cope with the issue; F-0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F-0 residuals are added to the model-based Fo's before the waveform generation. The model constraint has another merit; relations between generated F-0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating Fo's in the F-0 model framework. Several experimental results supporting the advantages of the method are shown.
引用
收藏
页码:555 / 560
页数:6
相关论文
共 50 条
  • [41] PARAMETER GENERATION ALGORITHM CONSIDERING MODULATION SPECTRUM FOR HMM-BASED SPEECH SYNTHESIS
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4210 - 4214
  • [42] On the state definition for a trainable excitation model in HMM-based speech synthesis
    Maia, R.
    Toda, T.
    Tokuda, K.
    Sakai, S.
    Nakamura, S.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3965 - 3968
  • [43] HMM-based Speech Synthesis with a Flexible Mandarin Stress Adaptation Model
    Li, Ya
    Pan, Shifeng
    Tao, Jianhua
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 625 - 628
  • [44] Inverse filter based excitation model for HMM-based speech synthesis system
    Reddy, Mittapalle Kiran
    Rao, Krothapalli Sreenivasa
    IET SIGNAL PROCESSING, 2018, 12 (04) : 544 - 548
  • [45] Improvements to HMM-Based Speech Synthesis Based on Parameter Generation with Rich Context Models
    Takamichi, Shinnosuke
    Toda, Tomoki
    Shiga, Yoshinori
    Sakti, Sakriani
    Neubig, Graham
    Nakamura, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 364 - 368
  • [46] A New Method for F0 Tracking Errors Fix and Generation in HMM-based Mandarin Speech Synthesis using Generation Process Model
    Wang, Miaomiao
    Wen, Miaomiao
    Hirose, Keikichi
    Minematsu, Nobuaki
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 609 - 612
  • [47] Use of voicing features in HMM-based speech recognition
    Thomson, DL
    Chengalvarayan, R
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211
  • [48] State duration modeling for HMM-based speech synthesis
    Zen, Heiga
    Masuko, Takashi
    Tokuda, Keiichi
    Yoshimura, Takayoshi
    Kobayasih, Takao
    Kitamura, Tadashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
  • [49] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
    Picart, Benjamin
    Drugman, Thomas
    Dutoit, Thierry
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02): : 687 - 707
  • [50] Optimal Number of States in HMM-Based Speech Synthesis
    Hanzlicek, Zdenek
    TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 353 - 361