A Targets-based Superpositional Model of Fundamental Frequency Contours Applied to HMM-based Speech Synthesis

被引:0
|
作者
Ni, Jinfu [1 ]
Shiga, Yoshinori [1 ]
Hori, Chiori [1 ]
Kidawara, Yutaka [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Universal Commun Res Inst, Kyoto, Japan
关键词
Prosody modeling; Superpositional F0 model; Continuous F0 modeling; HMM-based speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Superpositional model of fundamental frequency (F-0) contours as suggested by the Fujisaki model can well represent F-0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, improvement of HMM-based speech synthesis is expected by using the merit of superpositional model. In this paper, a targets-based superpositional model is proposed in the light of the Fujisaki model. Here, both accent and phrase components are parameterized by respectively defined low and high targets which allow flexible interaction between accent and phrase components. Due to the flexible interaction, the new method consistently treats such complex Fo movements as low digging, varying declination, and final lowering by simply adjusting parameter values. This facilitates extraction of the model parameters from observed F-0 contours, which is one of major problems preventing the use of the Fujisaki model. Extraction of the target parameters is evaluated for a Japanese speech corpus and the F-0 contours generated by the model are used for HMM training instead of the original. Listening test of synthetic speech indicates significant improvements in speech quality. Micro-prosodic effects are also investigated. Results show that adding the micro-prosody to the generated F-0 contours does not significantly improve speech quality.
引用
收藏
页码:1051 / 1055
页数:5
相关论文
共 50 条
  • [11] Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
    Gao, Weixun
    Cao, Qiying
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1149 - 1166
  • [12] Croatian HMM-based speech synthesis
    Department of Informatics, Faculty of Philosophy, University of Rijeka, Omladinska 14, Rijeka
    51000, Croatia
    J. Compt. Inf. Technol., 2006, 4 (307-313):
  • [13] HMM-Based Vietnamese Speech Synthesis
    Trinh Quoc Son
    2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 349 - 353
  • [14] Robustness of HMM-based Speech Synthesis
    Yamagishi, Junichi
    Ling, Zhenhua
    King, Simon
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 581 - 584
  • [15] Czech HMM-Based Speech Synthesis
    Hanzlicek, Zdenek
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 291 - 298
  • [16] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Jinfu Ni
    Yoshinori Shiga
    Chiori Hori
    Journal of Signal Processing Systems, 2016, 82 : 273 - 286
  • [17] Arabic HMM-based Speech Synthesis
    Khalil, Krichi Mohamed
    Adnan, Cherif
    2013 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND SOFTWARE APPLICATIONS (ICEESA), 2013, : 450 - 454
  • [18] HMM-Based Vietnamese Speech Synthesis
    Trinh, Son
    Hoang, Kiem
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2015, 3 (04) : 33 - 47
  • [19] Synthesis of fundamental frequency contours for standard chinese based on superpositional and tone nucleus models
    Hirose, Keikichi
    Sun, Qinghua
    Minematsu, Nobuaki
    ARCHIVES OF ACOUSTICS, 2007, 32 (01) : 41 - 50
  • [20] Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model
    Hirose, Keikichi
    Hashimoto, Hiroya
    Ikeshima, Jun
    Minematsu, Nobuaki
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 171 - 174