SPEECH PARAMETER GENERATION CONSIDERING LSP ORDERING PROPERTY FOR HMM-BASED SPEECH SYNTHESIS

被引:0
|
作者
Qian, Shijun [1 ,2 ]
Wang, Huanliang [2 ]
Pei, Wenjiang [1 ]
Zou, Ping [2 ]
Wang, Kai [1 ]
机构
[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing, Jiangsu, Peoples R China
[2] AL Speech Co Ltd, Suzhou, Peoples R China
基金
国家教育部博士点专项基金资助;
关键词
Speech synthesis; hidden Markov model; parameter generation; line spectral pair; ordering property;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
LSP has many advantages for speech representation, especially correlates well to spectrum formants as long as the LSP parameters are strictly ordered and bounded. This ordering property cannot be guaranteed during HMM-based speech synthesis when LSP is adopted as the spectrum feature, because diagonal covariance is utilized and correlation between LSP dimensions is ignored, with the result that unstable issue will be caused in synthesized speech. In this paper, we modify the parameter generation criterion to preserve ordering property of generated LSPs, by considering not only the likelihoods for HMM and GV maximized in conventional method but also a mis-orderings penalty. Experimental results show that the proposed method can alleviate the mis-orderings significantly and achieve high quality synthesizing performance when the penalty weight is selected appropriately.
引用
收藏
页码:330 / 334
页数:5
相关论文
共 50 条
  • [1] Parameter Generation Considering LSP Ordering Property for HMM-Based Speech Synthesis
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Wang, Kai
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (08) : 467 - 470
  • [2] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    Toda, Tomoki
    Tokuda, Keiichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05) : 816 - 824
  • [3] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
  • [4] PARAMETER GENERATION ALGORITHM CONSIDERING MODULATION SPECTRUM FOR HMM-BASED SPEECH SYNTHESIS
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4210 - 4214
  • [5] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
  • [6] PRESERVE ORDERING PROPERTY OF GENERATED LSPS FOR MINIMUM GENERATION ERROR TRAINING IN HMM-BASED SPEECH SYNTHESIS
    Lei, Ming
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4712 - 4715
  • [7] MINIMUM GENERATION ERROR TRAINING WITH WEIGHTED EUCLIDEAN DISTANCE ON LSP FOR HMM-BASED SPEECH SYNTHESIS
    Lei, Ming
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4230 - 4233
  • [8] A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis
    Nose, Takashi
    Chunwijitra, Vataya
    Kobayashi, Takao
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 221 - 228
  • [9] Improvements to HMM-Based Speech Synthesis Based on Parameter Generation with Rich Context Models
    Takamichi, Shinnosuke
    Toda, Tomoki
    Shiga, Yoshinori
    Sakti, Sakriani
    Neubig, Graham
    Nakamura, Satoshi
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 364 - 368
  • [10] Minimum Kullback-Leibler Divergence Parameter Generation for HMM-Based Speech Synthesis
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1492 - 1502