USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS

被引：0

作者：

Hirose, Keikichi ^{[1
]}

Hashimoto, Hiroya ^{[2
]}

Hyakutake, Kyota ^{[2
]}

Saito, Daisuke ^{[1
]}

Minematsu, Nobuaki ^{[2
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan

[2] Univ Tokyo, Dept Elect Engn & Informat Syst, Grad Sch Engn, Tokyo, Japan

来源：

2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2014年

关键词：

Generation process model; HMM-based speech synthesis; F-0; residual; Flexible F-0 control;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Generation process model of fundamental frequency (F-0) contours is known to represent global movements of F-0's keeping a clear relation with linguistic information of utterances. While HMMbased speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F-0 contours approximated by the model for HMM training instead of observed F-0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F-0 contours not represented by the model (F-0 residuals) are ignored. A scheme is further introduced to cope with the issue; F-0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F-0 residuals are added to the model-based Fo's before the waveform generation. The model constraint has another merit; relations between generated F-0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating Fo's in the F-0 model framework. Several experimental results supporting the advantages of the method are shown.

引用

页码：555 / 560

页数：6

共 50 条

[21] An acoustic model adaptation using hmm-based speech synthesis
Tanaka, K
Kuroiwa, S
Tsuge, S
Ren, F
2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 368 - 373
[22] Czech HMM-Based Speech Synthesis: Experiments with Model Adaptation
Hanzlicek, Zdenek
TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 107 - 114
[23] Generation of creaky voice for improving the quality of HMM-based speech synthesis
Narendra, N. P.
Rao, K. Sreenivasa
COMPUTER SPEECH AND LANGUAGE, 2017, 42 : 38 - 58
[24] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
Toda, Tomoki
Tokuda, Keiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 816 - 824
[25] SPEECH PARAMETER GENERATION CONSIDERING LSP ORDERING PROPERTY FOR HMM-BASED SPEECH SYNTHESIS
Qian, Shijun
Wang, Huanliang
Pei, Wenjiang
Zou, Ping
Wang, Kai
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 330 - 334
[26] Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis
Wen, Zhengqi
Tao, Jianhua
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1426 - 1429
[27] Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis
Nurk, Tonis
HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 162 - 168
[28] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
Chunwijitra, Vataya
Nose, Takashi
Kobayashi, Takao
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
[29] HMM-Based Speech Synthesis for the Greek Language
Karabetsos, Sotiris
Tsiakoulis, Pirros
Chalamandaris, Aimilios
Raptis, Spyros
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 349 - 356
[30] A BAYESIAN APPROACH TO HMM-BASED SPEECH SYNTHESIS
Hashimoto, Kei
Zen, Heiga
Nankaku, Yoshihiko
Masuko, Takashi
Tokuda, Keiichi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4029 - +

← 1 2 3 4 5 →