A Targets-based Superpositional Model of Fundamental Frequency Contours Applied to HMM-based Speech Synthesis

被引:0
|
作者
Ni, Jinfu [1 ]
Shiga, Yoshinori [1 ]
Hori, Chiori [1 ]
Kidawara, Yutaka [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Universal Commun Res Inst, Kyoto, Japan
关键词
Prosody modeling; Superpositional F0 model; Continuous F0 modeling; HMM-based speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Superpositional model of fundamental frequency (F-0) contours as suggested by the Fujisaki model can well represent F-0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, improvement of HMM-based speech synthesis is expected by using the merit of superpositional model. In this paper, a targets-based superpositional model is proposed in the light of the Fujisaki model. Here, both accent and phrase components are parameterized by respectively defined low and high targets which allow flexible interaction between accent and phrase components. Due to the flexible interaction, the new method consistently treats such complex Fo movements as low digging, varying declination, and final lowering by simply adjusting parameter values. This facilitates extraction of the model parameters from observed F-0 contours, which is one of major problems preventing the use of the Fujisaki model. Extraction of the target parameters is evaluated for a Japanese speech corpus and the F-0 contours generated by the model are used for HMM training instead of the original. Listening test of synthetic speech indicates significant improvements in speech quality. Micro-prosodic effects are also investigated. Results show that adding the micro-prosody to the generated F-0 contours does not significantly improve speech quality.
引用
收藏
页码:1051 / 1055
页数:5
相关论文
共 50 条
  • [1] Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis Representation of fundamental frequency contours for statistical speech synthesis
    Hirose, Keikichi
    [J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 171 - 176
  • [2] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
    Matsuda, Tetsuya
    Hirose, Keikichi
    Minematsu, Nobuaki
    [J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
  • [3] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
    Hirose, Keikichi
    Hashimoto, Hiroya
    Ikeshima, Jun
    Minematsu, Nobuaki
    [J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
  • [4] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
    Hirose, Keikichi
    Matsuda, Tatsuya
    Hashimoto, Hiroya
    Minematsu, Nobuaki
    [J]. 2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [5] An HMM-based speech synthesis system applied to English
    Tokuda, K
    Zen, H
    Black, AW
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 227 - 230
  • [6] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
    Hashimoto, Hiroya
    Hirose, Keikichi
    Minematsu, Nobuaki
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
  • [7] USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS
    Hirose, Keikichi
    Hashimoto, Hiroya
    Hyakutake, Kyota
    Saito, Daisuke
    Minematsu, Nobuaki
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 555 - 560
  • [8] A trainable excitation model for HMM-based speech synthesis
    Maia, R.
    Toda, T.
    Zen, H.
    Nankaku, Y.
    Tokuda, K.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
  • [9] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 273 - 286
  • [10] Superpositional HMM-based intonation synthesis using a functional F0 model
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 270 - 274