Review of F0 modelling and generation in HMM based speech synthesis

被引：0

作者：

Yu, Kai ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China

来源：

PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3 | 2012年

关键词：

statistical speech synthesis; HMM based synthesis; F0; modelling;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fundamental frequency, or F0, is a critical factor in synthesising speech which is both natural and expressive. In HMM based speech synthesis, the modelling and generation of F0 is one of the key difficult factors which differentiate synthesis from recognition. Firstly, this is because F0 values are normally considered as a discontinuous function of time, whose domain is partly continuous and partly discrete. This results in two issues to be addressed in F0 modelling and generation: voiced/unvoiced decision and F0 trajectory. Another important characteristics of F0 is that it is supra-segmental, which means F0 should be modelled beyond the traditional phoneme level. Thirdly, the purpose of F0 modelling is not only for general high quality synthetic speech, but also for effective delivery of expressiveness. This requires explicitly link F0 modelling to (para/non-) linguistic information so that the control of F0 is easy and feasible. This paper reviews the state-of-the-art frameworks to address these issues. Possible future research directions are also discussed.

引用

页码：599 / 604

页数：6

共 50 条

[41] HMM-Based Voice Conversion Using Quantized F0 Context
Nose, Takashi
Ota, Yuhei
Kobayashi, Takao
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2483 - 2490
[42] F0 parameterization of glottalized tones for HMM-based Vietnamese TTS
Ninh, Duy Khanh
Yamashita, Yoichi
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2202 - 2206
[43] MODEL FOR F0 GENERATION BASED ON MULTISPEAKER OBSERVATIONS
CHOPPY, C
LIENARD, JS
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S85 - S85
[44] Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
Wang, Xin
Takaki, Shinji
Yamagishi, Junichi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (08) : 1406 - 1419
[45] Inversion of F0 model for natural-sounding speech synthesis
Rossi, PS
Palmieri, F
Cutugno, F
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 520 - 523
[46] Multilevel parametric-base F0 model for speech synthesis
Latorre, Javier
Akamine, Masami
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2274 - 2277
[47] The use of a generative model of F0 contours for multilingual speech synthesis
Fujisaki, H
Ohno, S
ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 714 - 717
[48] A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis
Ribeiro, Manuel Sam
Yamagishi, Junichi
Clark, Robert A. J.
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1586 - 1590
[49] Generative modeling of speech F0 contours
Kameoka, Hirokazu
Yoshizato, Kota
Ishihara, Tatsuma
Ohishi, Yasunori
Kashino, Kunio
Sagayama, Shigeki
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1825 - 1829
[50] Speech parameter generation algorithms for HMM-based speech synthesis
Tokuda, K
Yoshimura, T
Masuko, T
Kobayashi, T
Kitamura, T
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318

← 1 2 3 4 5 →