Review of F0 modelling and generation in HMM based speech synthesis

被引:0
|
作者
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
statistical speech synthesis; HMM based synthesis; F0; modelling;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fundamental frequency, or F0, is a critical factor in synthesising speech which is both natural and expressive. In HMM based speech synthesis, the modelling and generation of F0 is one of the key difficult factors which differentiate synthesis from recognition. Firstly, this is because F0 values are normally considered as a discontinuous function of time, whose domain is partly continuous and partly discrete. This results in two issues to be addressed in F0 modelling and generation: voiced/unvoiced decision and F0 trajectory. Another important characteristics of F0 is that it is supra-segmental, which means F0 should be modelled beyond the traditional phoneme level. Thirdly, the purpose of F0 modelling is not only for general high quality synthetic speech, but also for effective delivery of expressiveness. This requires explicitly link F0 modelling to (para/non-) linguistic information so that the control of F0 is easy and feasible. This paper reviews the state-of-the-art frameworks to address these issues. Possible future research directions are also discussed.
引用
收藏
页码:599 / 604
页数:6
相关论文
共 50 条
  • [41] HMM-Based Voice Conversion Using Quantized F0 Context
    Nose, Takashi
    Ota, Yuhei
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2483 - 2490
  • [42] F0 parameterization of glottalized tones for HMM-based Vietnamese TTS
    Ninh, Duy Khanh
    Yamashita, Yoichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2202 - 2206
  • [43] MODEL FOR F0 GENERATION BASED ON MULTISPEAKER OBSERVATIONS
    CHOPPY, C
    LIENARD, JS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S85 - S85
  • [44] Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (08) : 1406 - 1419
  • [45] Inversion of F0 model for natural-sounding speech synthesis
    Rossi, PS
    Palmieri, F
    Cutugno, F
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 520 - 523
  • [46] Multilevel parametric-base F0 model for speech synthesis
    Latorre, Javier
    Akamine, Masami
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2274 - 2277
  • [47] The use of a generative model of F0 contours for multilingual speech synthesis
    Fujisaki, H
    Ohno, S
    ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 714 - 717
  • [48] A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis
    Ribeiro, Manuel Sam
    Yamagishi, Junichi
    Clark, Robert A. J.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1586 - 1590
  • [49] Generative modeling of speech F0 contours
    Kameoka, Hirokazu
    Yoshizato, Kota
    Ishihara, Tatsuma
    Ohishi, Yasunori
    Kashino, Kunio
    Sagayama, Shigeki
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1825 - 1829
  • [50] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318