USE OF FUNDAMENTAL FREQUENCIES SHAPED BY GENERATION PROCESS MODEL FOR HMM-BASED SPEECH SYNTHESIS

被引:0
|
作者
Hirose, Keikichi [1 ]
Hashimoto, Hiroya [2 ]
Hyakutake, Kyota [2 ]
Saito, Daisuke [1 ]
Minematsu, Nobuaki [2 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Tokyo, Japan
[2] Univ Tokyo, Dept Elect Engn & Informat Syst, Grad Sch Engn, Tokyo, Japan
来源
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2014年
关键词
Generation process model; HMM-based speech synthesis; F-0; residual; Flexible F-0 control;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Generation process model of fundamental frequency (F-0) contours is known to represent global movements of F-0's keeping a clear relation with linguistic information of utterances. While HMMbased speech synthesis can generate a good quality of speech, problems, which arise from frame-by-frame processing, are pointed out. These problems are expected to be solved by incorporating the model constraints. A method is developed to use F-0 contours approximated by the model for HMM training instead of observed F-0 contours. A clear improvement in the quality of synthetic speech is shown through listening experiments. In the method, fragments of F-0 contours not represented by the model (F-0 residuals) are ignored. A scheme is further introduced to cope with the issue; F-0 residuals are also included in the training and synthesis processes of HMM-based speech synthesis, and the generated F-0 residuals are added to the model-based Fo's before the waveform generation. The model constraint has another merit; relations between generated F-0 contours and texts are clear, and it is possible to add linguistic information such as emphasis to synthetic speech, or to change speaking styles through manipulating Fo's in the F-0 model framework. Several experimental results supporting the advantages of the method are shown.
引用
收藏
页码:555 / 560
页数:6
相关论文
共 50 条
  • [31] An HMM-based Vietnamese Speech Synthesis System
    Vu, Thang Tat
    Luong, Mai Chi
    Nakamura, Satoshi
    ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 116 - +
  • [32] An HMM-based Cantonese Speech Synthesis System
    Wang, Xin
    Wu, Zhiyong
    2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
  • [33] Unsupervised adaptation for HMM-based speech synthesis
    King, Simon
    Tokuda, Keiichi
    Zen, Heiga
    Yamagishi, Junichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1869 - +
  • [34] On the Use of Extended Context for HMM-based Spontaneous Conversational Speech Synthesis
    Koriyama, Tomoki
    Nose, Takashi
    Kobayashi, Takao
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2668 - 2671
  • [35] Thousands of Voices for HMM-based Speech Synthesis
    Yamagishi, Junichi
    Usabaev, Bela
    King, Simon
    Watts, Oliver
    Dines, John
    Tian, Jilei
    Hu, Rile
    Guan, Yong
    Oura, Keiichiro
    Tokuda, Keiichi
    Karhila, Reima
    Kurimo, Mikko
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 416 - +
  • [36] Analysis of HMM-Based Lombard Speech Synthesis
    Raitio, Tuomo
    Suni, Antti
    Vainio, Martti
    Alku, Paavo
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
  • [37] A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis
    Nose, Takashi
    Chunwijitra, Vataya
    Kobayashi, Takao
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 221 - 228
  • [38] Parameter Generation Considering LSP Ordering Property for HMM-Based Speech Synthesis
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Wang, Kai
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (08) : 467 - 470
  • [39] A training method of average voice model for HMM-based speech synthesis
    Yamagishi, J
    Tamura, M
    Masuko, T
    Tokuda, K
    Kobayashi, T
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2003, E86A (08) : 1956 - 1963
  • [40] HMM-based emotional speech synthesis using average emotion model
    Qin, Long
    Ling, Zhen-Hua
    Wu, Yi-Jian
    Zhang, Bu-Fan
    Wang, Ren-Hua
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 233 - +