USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS

被引：0

作者：

Hirose, Keikichi ^{[1
]}

Hashimoto, Hiroya ^{[2
]}

Hyakutake, Kyota ^{[2
]}

Saito, Daisuke ^{[1
]}

Minematsu, Nobuaki ^{[2
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan

[2] Univ Tokyo, Dept Elect Engn & Informat Syst, Grad Sch Engn, Tokyo, Japan

来源：

2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2014年

关键词：

Generation process model; HMM-based speech synthesis; F-0; residual; Flexible F-0 control;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Generation process model of fundamental frequency (F-0) contours is known to represent global movements of F-0's keeping a clear relation with linguistic information of utterances. While HMMbased speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F-0 contours approximated by the model for HMM training instead of observed F-0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F-0 contours not represented by the model (F-0 residuals) are ignored. A scheme is further introduced to cope with the issue; F-0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F-0 residuals are added to the model-based Fo's before the waveform generation. The model constraint has another merit; relations between generated F-0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating Fo's in the F-0 model framework. Several experimental results supporting the advantages of the method are shown.

引用

页码：555 / 560

页数：6

共 50 条

[31] An HMM-based Vietnamese Speech Synthesis System
Vu, Thang Tat
Luong, Mai Chi
Nakamura, Satoshi
ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
[32] An HMM-based Cantonese Speech Synthesis System
Wang, Xin
Wu, Zhiyong
2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
[33] Unsupervised adaptation for HMM-based speech synthesis
King, Simon
Tokuda, Keiichi
Zen, Heiga
Yamagishi, Junichi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
[34] On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis
Koriyama, Tomoki
Nose, Takashi
Kobayashi, Takao
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2668 - 2671
[35] Thousands of Voices for HMM-based Speech Synthesis
Yamagishi, Junichi
Usabaev, Bela
King, Simon
Watts, Oliver
Dines, John
Tian, Jilei
Hu, Rile
Guan, Yong
Oura, Keiichiro
Tokuda, Keiichi
Karhila, Reima
Kurimo, Mikko
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 416 - +
[36] Analysis of HMM-Based Lombard Speech Synthesis
Raitio, Tuomo
Suni, Antti
Vainio, Martti
Alku, Paavo
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
[37] A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis
Nose, Takashi
Chunwijitra, Vataya
Kobayashi, Takao
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 221 - 228
[38] Parameter Generation Considering LSP Ordering Property for HMM-Based Speech Synthesis
Qian, Shijun
Wang, Huanliang
Pei, Wenjiang
Wang, Kai
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (08) : 467 - 470
[39] A training method of average voice model for HMM-based speech synthesis
Yamagishi, J
Tamura, M
Masuko, T
Tokuda, K
Kobayashi, T
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2003, E86A (08) : 1956 - 1963
[40] HMM-based emotional speech synthesis using average emotion model
Qin, Long
Ling, Zhen-Hua
Wu, Yi-Jian
Zhang, Bu-Fan
Wang, Ren-Hua
CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 233 - +

← 1 2 3 4 5 →