A Targets-based Superpositional Model of Fundamental Frequency Contours Applied to HMM-based Speech Synthesis

被引：0

作者：

Ni, Jinfu ^{[1
]}

Shiga, Yoshinori ^{[1
]}

Hori, Chiori ^{[1
]}

Kidawara, Yutaka ^{[1
]}

机构：

[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Universal Commun Res Inst, Kyoto, Japan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Prosody modeling; Superpositional F0 model; Continuous F0 modeling; HMM-based speech synthesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Superpositional model of fundamental frequency (F-0) contours as suggested by the Fujisaki model can well represent F-0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, improvement of HMM-based speech synthesis is expected by using the merit of superpositional model. In this paper, a targets-based superpositional model is proposed in the light of the Fujisaki model. Here, both accent and phrase components are parameterized by respectively defined low and high targets which allow flexible interaction between accent and phrase components. Due to the flexible interaction, the new method consistently treats such complex Fo movements as low digging, varying declination, and final lowering by simply adjusting parameter values. This facilitates extraction of the model parameters from observed F-0 contours, which is one of major problems preventing the use of the Fujisaki model. Extraction of the target parameters is evaluated for a Japanese speech corpus and the F-0 contours generated by the model are used for HMM training instead of the original. Listening test of synthetic speech indicates significant improvements in speech quality. Micro-prosodic effects are also investigated. Results show that adding the micro-prosody to the generated F-0 contours does not significantly improve speech quality.

引用

页码：1051 / 1055

页数：5

共 50 条

[1] Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis Representation of fundamental frequency contours for statistical speech synthesis
Hirose, Keikichi
[J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 171 - 176
[2] Control of Fundamental Frequency Contours Using the Generation Process Model in HMM-Based Speech Synthesis
Matsuda, Tetsuya
Hirose, Keikichi
Minematsu, Nobuaki
[J]. 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 617 - 620
[3] Use of Generation Process Model for Synthesizing Fundamental Frequency Contours in HMM-based Speech Synthesis
Hirose, Keikichi
Hashimoto, Hiroya
Ikeshima, Jun
Minematsu, Nobuaki
[J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 575 - +
[4] REPRESENTING FUNDAMENTAL FREQUENCY CONTOURS GENERATED BY HMM-BASED SPEECH SYNTHESIS USING GENERATION PROCESS MODEL
Hirose, Keikichi
Matsuda, Tatsuya
Hashimoto, Hiroya
Minematsu, Nobuaki
[J]. 2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[5] An HMM-based speech synthesis system applied to English
Tokuda, K
Zen, H
Black, AW
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 227 - 230
[6] Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
Hashimoto, Hiroya
Hirose, Keikichi
Minematsu, Nobuaki
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 458 - 461
[7] USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS
Hirose, Keikichi
Hashimoto, Hiroya
Hyakutake, Kyota
Saito, Daisuke
Minematsu, Nobuaki
[J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 555 - 560
[8] A trainable excitation model for HMM-based speech synthesis
Maia, R.
Toda, T.
Zen, H.
Nankaku, Y.
Tokuda, K.
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
[9] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
Ni, Jinfu
Shiga, Yoshinori
Hori, Chiori
[J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 273 - 286
[10] Superpositional HMM-based intonation synthesis using a functional F0 model
Ni, Jinfu
Shiga, Yoshinori
Hori, Chiori
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 270 - 274

← 1 2 3 4 5 →