Global Variance Modeling on the Log Power Spectrum of LSPs for HMM-based Speech Synthesis

被引：0

作者：

Ling, Zhen-Hua ^{[1
]}

Hu, Yu ^{[1
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, iFLYTEK Speech Lab, Hefei, Peoples R China

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

speech synthesis; hidden Markov model; global variance; power spectrum;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a method to model the global variance (GV) of log power spectrums derived from the line spectral pairs (LSPs) in a sentence for HMM-based parametric speech synthesis. Different from the conventional GV method where the observations for GV model training are the variances of spectral parameters for each training sentence, our proposed method directly models the temporal variances of each frequency point in the spectral envelope reconstructed using LSPs. At synthesis stage, the likelihood function of trained GV model is integrated into the maximum likelihood parameter generation algorithm to alleviate the over-smoothing effect on the generated spectral structures. Experiment results show that the proposed method can outperform the conventional GV method when LSPs are used as the spectral parameters and improve the naturalness of synthetic speech significantly.

引用

页码：825 / 828

页数：4

共 50 条

[1] Integrating Global Variance of Log Power Spectrum Derived from LSPs into MGE Training for HMM-Based Parametric Speech Synthesis
Sun, Yu-Sheng
Ling, Zhen-Hua
Yin, Xiang
Dai, Li-Rong
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 201 - 205
[2] Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
Yin, Xiang
Ling, Zhen-Hua
Lei, Ming
Dai, Li-Rong
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1146 - 1149
[3] GLOBAL VARIANCE MODELING ON FREQUENCY DOMAIN DELTA LSP FOR HMM-BASED SPEECH SYNTHESIS
Pan, Shifeng
Nankaku, Yoshihiko
Tokuda, Keiichi
Tao, Jianhua
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4716 - 4719
[4] Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
Wu, Yi-Jian
Tokuda, Keiichi
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 577 - 580
[5] TRAJECTORY TRAINING CONSIDERING GLOBAL VARIANCE FOR HMM-BASED SPEECH SYNTHESIS
Toda, Tomoki
Young, Steve
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4025 - +
[6] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
Toda, Tomoki
Tokuda, Keiichi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05): : 816 - 824
[7] Asynchronous F0 and Spectrum Modeling for HMM-Based Speech Synthesis
Wang, Cheng-Cheng
Ling, Zhen-Hua
Dai, Li-Rong
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 412 - 415
[8] State duration modeling for HMM-based speech synthesis
Zen, Heiga
Masuko, Takashi
Tokuda, Keiichi
Yoshimura, Takayoshi
Kobayasih, Takao
Kitamura, Tadashi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03): : 692 - 693
[9] Minimum generation error criterion considering global/local variance for HMM-based speech synthesis
Wu, Yi-Jian
Zen, Heiga
Nankaku, Yoshilliko
Tokuda, Keiichi
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4621 - 4624
[10] A POSTFILTER TO MODIFY THE MODULATION SPECTRUM IN HMM-BASED SPEECH SYNTHESIS
Takamichi, Shinnosuke
Toda, Tomoki
Neubig, Graham
Sakti, Sakriani
Nakamura, Satoshi
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →