Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

被引:2
|
作者
Nakamura, Kazuhiro [1 ]
Hashimoto, Kei [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi 4668555, Japan
来源
基金
日本科学技术振兴机构;
关键词
integrative model; HMM-based speech synthesis; acoustic modeling; mel-cepstral analysis; trajectory HMM; HIDDEN MARKOV-MODELS; GENERATION;
D O I
10.1587/transinf.E97.D.1438
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-by-frame feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given mel-cepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper. we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures.
引用
收藏
页码:1438 / 1448
页数:11
相关论文
共 50 条
  • [1] Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-based Speech Synthesis
    Chen, Ling-Hui
    Nankaku, Yoshihiko
    Zen, Heiga
    Tokuda, Keiichi
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1812 - +
  • [2] State duration modeling for HMM-based speech synthesis
    Zen, Heiga
    Masuko, Takashi
    Tokuda, Keiichi
    Yoshimura, Takayoshi
    Kobayasih, Takao
    Kitamura, Tadashi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
  • [3] INTEGRATION OF ACOUSTIC MODELING AND MEL-CEPSTRAL ANALYSIS FOR HMM-BASED SPEECH SYNTHESIS
    Nakamura, Kazuhiro
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7883 - 7887
  • [4] Resonance-based spectral deformation in HMM-based speech synthesis
    Spoken Language Communication Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology, Kyoto, Japan
    不详
    [J]. Int. Symp. Chin. Spoken Lang. Process., ISCSLP, (88-92):
  • [5] RESONANCE-BASED SPECTRAL DEFORMATION IN HMM-BASED SPEECH SYNTHESIS
    Ni, Jinfu
    Shiga, Yoshinori
    Kawai, Hisashi
    Kashioka, Hideki
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 88 - 92
  • [6] Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Koo, Hyun Woo
    Kim, Nam Soo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (02): : 379 - 382
  • [7] Lip Feature Extraction and Reduction for HMM-Based Visual Speech Recognition Systems
    Alizadeh, S.
    Boostani, R.
    Asadpour, V.
    [J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 561 - +
  • [8] Hybrid NN/HMM-based speech recognition with a discriminant neural feature extraction
    Willett, D
    Rigoll, G
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 763 - 769
  • [9] HMM-Based Vietnamese Speech Synthesis
    Trinh Quoc Son
    [J]. 2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 349 - 353
  • [10] Robustness of HMM-based Speech Synthesis
    Yamagishi, Junichi
    Ling, Zhenhua
    King, Simon
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 581 - 584