USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS

被引：0

作者：

Hirose, Keikichi ^{[1
]}

Hashimoto, Hiroya ^{[2
]}

Hyakutake, Kyota ^{[2
]}

Saito, Daisuke ^{[1
]}

Minematsu, Nobuaki ^{[2
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan

[2] Univ Tokyo, Dept Elect Engn & Informat Syst, Grad Sch Engn, Tokyo, Japan

来源：

2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2014年

关键词：

Generation process model; HMM-based speech synthesis; F-0; residual; Flexible F-0 control;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Generation process model of fundamental frequency (F-0) contours is known to represent global movements of F-0's keeping a clear relation with linguistic information of utterances. While HMMbased speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F-0 contours approximated by the model for HMM training instead of observed F-0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F-0 contours not represented by the model (F-0 residuals) are ignored. A scheme is further introduced to cope with the issue; F-0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F-0 residuals are added to the model-based Fo's before the waveform generation. The model constraint has another merit; relations between generated F-0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating Fo's in the F-0 model framework. Several experimental results supporting the advantages of the method are shown.

引用

页码：555 / 560

页数：6

共 50 条

[41] PARAMETER GENERATION ALGORITHM CONSIDERING MODULATION SPECTRUM FOR HMM-BASED SPEECH SYNTHESIS
Takamichi, Shinnosuke
Toda, Tomoki
Black, Alan W.
Nakamura, Satoshi
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4210 - 4214
[42] On the state definition for a trainable excitation model in HMM-based speech synthesis
Maia, R.
Toda, T.
Tokuda, K.
Sakai, S.
Nakamura, S.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3965 - 3968
[43] HMM-based Speech Synthesis with a Flexible Mandarin Stress Adaptation Model
Li, Ya
Pan, Shifeng
Tao, Jianhua
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 625 - 628
[44] Inverse filter based excitation model for HMM-based speech synthesis system
Reddy, Mittapalle Kiran
Rao, Krothapalli Sreenivasa
IET SIGNAL PROCESSING, 2018, 12 (04) : 544 - 548
[45] Improvements to HMM-Based Speech Synthesis Based on Parameter Generation with Rich Context Models
Takamichi, Shinnosuke
Toda, Tomoki
Shiga, Yoshinori
Sakti, Sakriani
Neubig, Graham
Nakamura, Satoshi
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 364 - 368
[46] A New Method for F0 Tracking Errors Fix and Generation in HMM-based Mandarin Speech Synthesis using Generation Process Model
Wang, Miaomiao
Wen, Miaomiao
Hirose, Keikichi
Minematsu, Nobuaki
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 609 - 612
[47] Use of voicing features in HMM-based speech recognition
Thomson, DL
Chengalvarayan, R
SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211
[48] State duration modeling for HMM-based speech synthesis
Zen, Heiga
Masuko, Takashi
Tokuda, Keiichi
Yoshimura, Takayoshi
Kobayasih, Takao
Kitamura, Tadashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
[49] Analysis and HMM-based synthesis of hypo and hyperarticulated speech
Picart, Benjamin
Drugman, Thomas
Dutoit, Thierry
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02): : 687 - 707
[50] Optimal Number of States in HMM-Based Speech Synthesis
Hanzlicek, Zdenek
TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 353 - 361

← 1 2 3 4 5 →