Global Variance Modeling on the Log Power Spectrum of LSPs for HMM-based Speech Synthesis

被引：0

作者：

Ling, Zhen-Hua ^{[1
]}

Hu, Yu ^{[1
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, iFLYTEK Speech Lab, Hefei, Peoples R China

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

speech synthesis; hidden Markov model; global variance; power spectrum;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a method to model the global variance (GV) of log power spectrums derived from the line spectral pairs (LSPs) in a sentence for HMM-based parametric speech synthesis. Different from the conventional GV method where the observations for GV model training are the variances of spectral parameters for each training sentence, our proposed method directly models the temporal variances of each frequency point in the spectral envelope reconstructed using LSPs. At synthesis stage, the likelihood function of trained GV model is integrated into the maximum likelihood parameter generation algorithm to alleviate the over-smoothing effect on the generated spectral structures. Experiment results show that the proposed method can outperform the conventional GV method when LSPs are used as the spectral parameters and improve the naturalness of synthetic speech significantly.

引用

页码：825 / 828

页数：4

共 50 条

[41] Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis
Zhengqi Wen
Jianhua Tao
Shifeng Pan
Yang Wang
[J]. Journal of Signal Processing Systems, 2014, 74 : 423 - 435
[42] Speech parameter generation algorithms for HMM-based speech synthesis
Tokuda, K
Yoshimura, T
Masuko, T
Kobayashi, T
Kitamura, T
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
[43] A Hierarchical F0 Modeling Method for HMM-based Speech Synthesis
Lei, Ming
Wu, Yi-Jian
Soong, Frank K.
Ling, Zhen-Hua
Dai, Li-Rong
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2170 - +
[44] Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis
Pucher, Michael
Schabus, Dietmar
Yamagishi, Junichi
Neubarth, Friedrich
Strom, Volker
[J]. SPEECH COMMUNICATION, 2010, 52 (02) : 164 - 179
[45] HMM-based speech enhancement using harmonic modeling
Deisher, ME
Spanias, AS
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1175 - 1178
[46] HMM-based gain modeling for enhancement of speech in noise
Zhao, David Y.
Kleijn, W. Bastiaan
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 882 - 892
[47] Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis
Yamagishi, J
Onishi, K
Masuko, T
Kobayashi, T
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 502 - 509
[48] AN OPTIMIZATION ALGORITHM OF INDEPENDENT MEAN AND VARIANCE PARAMETER TYING STRUCTURES FOR HMM-BASED SPEECH SYNTHESIS
Takaki, Shinji
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4700 - 4703
[49] HMM-based speech synthesis using sub-band basis spectrum model
Ohtani, Yamato
Tamura, Masatsune
Morita, Masahiro
Kagoshima, Takehiko
Akamine, Masami
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1438 - 1441
[50] Modified Post-filter to Recover Modulation Spectrum for HMM-based Speech Synthesis
Takamichi, Shinnosuke
Toda, Tomoki
Black, Alan W.
Nakamura, Satoshi
[J]. 2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 547 - 551

← 1 2 3 4 5 →