Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

被引：2

作者：

Nakamura, Kazuhiro ^{[1
]}

Hashimoto, Kei ^{[1
]}

Nankaku, Yoshihiko ^{[1
]}

Tokuda, Keiichi ^{[1
]}

机构：

[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi 4668555, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2014年 / E97D卷 / 06期

基金：

日本科学技术振兴机构;

关键词：

integrative model; HMM-based speech synthesis; acoustic modeling; mel-cepstral analysis; trajectory HMM; HIDDEN MARKOV-MODELS; GENERATION;

D O I：

10.1587/transinf.E97.D.1438

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-by-frame feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given mel-cepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper. we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures.

引用

页码：1438 / 1448

页数：11

共 50 条

[1] Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-based Speech Synthesis
Chen, Ling-Hui
Nankaku, Yoshihiko
Zen, Heiga
Tokuda, Keiichi
Ling, Zhen-Hua
Dai, Li-Rong
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1812 - +
[2] State duration modeling for HMM-based speech synthesis
Zen, Heiga
Masuko, Takashi
Tokuda, Keiichi
Yoshimura, Takayoshi
Kobayasih, Takao
Kitamura, Tadashi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
[3] INTEGRATION OF ACOUSTIC MODELING AND MEL-CEPSTRAL ANALYSIS FOR HMM-BASED SPEECH SYNTHESIS
Nakamura, Kazuhiro
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7883 - 7887
[4] Resonance-based spectral deformation in HMM-based speech synthesis
Spoken Language Communication Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology, Kyoto, Japan
不详
[J]. Int. Symp. Chin. Spoken Lang. Process., ISCSLP, (88-92):
[5] RESONANCE-BASED SPECTRAL DEFORMATION IN HMM-BASED SPEECH SYNTHESIS
Ni, Jinfu
Shiga, Yoshinori
Kawai, Hisashi
Kashioka, Hideki
[J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 88 - 92
[6] Statistical Approaches to Excitation Modeling in HMM-Based Speech Synthesis
Sung, June Sig
Hong, Doo Hwa
Koo, Hyun Woo
Kim, Nam Soo
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (02): : 379 - 382
[7] Lip Feature Extraction and Reduction for HMM-Based Visual Speech Recognition Systems
Alizadeh, S.
Boostani, R.
Asadpour, V.
[J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 561 - +
[8] Hybrid NN/HMM-based speech recognition with a discriminant neural feature extraction
Willett, D
Rigoll, G
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 763 - 769
[9] HMM-Based Vietnamese Speech Synthesis
Trinh Quoc Son
[J]. 2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 349 - 353
[10] Robustness of HMM-based Speech Synthesis
Yamagishi, Junichi
Ling, Zhenhua
King, Simon
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 581 - 584

← 1 2 3 4 5 →