Superpositional HMM-based intonation synthesis using a functional F0 model

被引:0
|
作者
Ni, Jinfu [1 ]
Shiga, Yoshinori [1 ]
Hori, Chiori [1 ]
机构
[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Univ Commun Res Inst, Kyoto, Japan
关键词
Intonation synthesis; HMM-based speech synthesis; functional F0 model; making focal prominence; prosody; AUTOMATIC EXTRACTION; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses intonation synthesis combining statistical and functional approach with manipulation of fundamental frequency (F-0) contours in HMM-based speech synthesis. An F-0 contour is represented as a sum of micro, accent, and register components at the logarithmic scale, which is rooted in the Fujisaki model. Separated context-dependent (CD) HMMs are trained for each type of components extracted from a speech corpus based on a functional F-0 model. At the phase of synthesis, CDHMM-generated micro, accent, and register components are superimposed to form F-0 contours for input text. Objective and subjective evaluations are carried out on a Japanese speech corpus. Compared with the conventional approach, this method not only demonstrates the improved performance in naturalness of synthetic speech by achieving better global F-0 behaviors but also shows its flexibility for intonation manipulation through modifying the functional model parameters.
引用
收藏
页码:270 / 274
页数:5
相关论文
共 50 条
  • [1] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 273 - 286
  • [2] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Jinfu Ni
    Yoshinori Shiga
    Chiori Hori
    Journal of Signal Processing Systems, 2016, 82 : 273 - 286
  • [3] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Dai, Li-Rong
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +
  • [4] Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis
    Wang, Cheng-Cheng
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 412 - 415
  • [5] A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Soong, Frank K.
    Ling, Zhen-Hua
    Dai, Li-Rong
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2170 - +
  • [6] HMM-Based Voice Conversion Using Quantized F0 Context
    Nose, Takashi
    Ota, Yuhei
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2483 - 2490
  • [7] Context-dependent additive log F0 model for HMM-based speech synthesis
    Zen, Heiga
    Braunschweiler, Norbert
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2039 - 2042
  • [8] CROSS-STREAM DEPENDENCY MODELING USING CONTINUOUS F0 MODEL FOR HMM-BASED SPEECH SYNTHESIS
    Wang, Xin
    Ling, Zhen-Hua
    Dai, Li-Rong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 84 - 87
  • [9] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    King, Simon
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [10] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Soheil Khorram
    Hossein Sameti
    Simon King
    EURASIP Journal on Advances in Signal Processing, 2015