A Targets-based Superpositional Model of Fundamental Frequency Contours Applied to HMM-based Speech Synthesis

被引：0

作者：

Ni, Jinfu ^{[1
]}

Shiga, Yoshinori ^{[1
]}

Hori, Chiori ^{[1
]}

Kidawara, Yutaka ^{[1
]}

机构：

[1] Natl Inst Informat & Commun Technol, Spoken Language Commun Lab, Universal Commun Res Inst, Kyoto, Japan

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Prosody modeling; Superpositional F0 model; Continuous F0 modeling; HMM-based speech synthesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Superpositional model of fundamental frequency (F-0) contours as suggested by the Fujisaki model can well represent F-0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, improvement of HMM-based speech synthesis is expected by using the merit of superpositional model. In this paper, a targets-based superpositional model is proposed in the light of the Fujisaki model. Here, both accent and phrase components are parameterized by respectively defined low and high targets which allow flexible interaction between accent and phrase components. Due to the flexible interaction, the new method consistently treats such complex Fo movements as low digging, varying declination, and final lowering by simply adjusting parameter values. This facilitates extraction of the model parameters from observed F-0 contours, which is one of major problems preventing the use of the Fujisaki model. Extraction of the target parameters is evaluated for a Japanese speech corpus and the F-0 contours generated by the model are used for HMM training instead of the original. Listening test of synthetic speech indicates significant improvements in speech quality. Micro-prosodic effects are also investigated. Results show that adding the micro-prosody to the generated F-0 contours does not significantly improve speech quality.

引用

页码：1051 / 1055

页数：5

共 50 条

[11] Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
Gao, Weixun
Cao, Qiying
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1149 - 1166
[12] Croatian HMM-based speech synthesis
Department of Informatics, Faculty of Philosophy, University of Rijeka, Omladinska 14, Rijeka
51000, Croatia
J. Compt. Inf. Technol., 2006, 4 (307-313):
[13] HMM-Based Vietnamese Speech Synthesis
Trinh Quoc Son
2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 349 - 353
[14] Robustness of HMM-based Speech Synthesis
Yamagishi, Junichi
Ling, Zhenhua
King, Simon
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 581 - 584
[15] Czech HMM-Based Speech Synthesis
Hanzlicek, Zdenek
TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 291 - 298
[16] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
Jinfu Ni
Yoshinori Shiga
Chiori Hori
Journal of Signal Processing Systems, 2016, 82 : 273 - 286
[17] Arabic HMM-based Speech Synthesis
Khalil, Krichi Mohamed
Adnan, Cherif
2013 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND SOFTWARE APPLICATIONS (ICEESA), 2013, : 450 - 454
[18] HMM-Based Vietnamese Speech Synthesis
Trinh, Son
Hoang, Kiem
INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2015, 3 (04) : 33 - 47
[19] Synthesis of fundamental frequency contours for standard chinese based on superpositional and tone nucleus models
Hirose, Keikichi
Sun, Qinghua
Minematsu, Nobuaki
ARCHIVES OF ACOUSTICS, 2007, 32 (01) : 41 - 50
[20] Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model
Hirose, Keikichi
Hashimoto, Hiroya
Ikeshima, Jun
Minematsu, Nobuaki
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 171 - 174

← 1 2 3 4 5 →