A speech parameter generation algorithm considering global variance for HMM-based speech synthesis

被引:239
|
作者
Toda, Tomoki [1 ]
Tokuda, Keiichi
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma 6300101, Japan
[2] Nagoya Inst Technol, Grad Sch Engn, Nagoya, Aichi 4668555, Japan
来源
关键词
HMM-based speech synthesis; speech parameter generation; maximum likelihood criterion; over-smoothing effect; global variance;
D O I
10.1093/ietisy/e90-d.5.816
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a novel parameter generation algorithm for an HMM-based speech synthesis technique. The conventional algorithm generates a parameter trajectory of static features that maximizes the likelihood of a given HMM for the parameter sequence consisting of the static and dynamic features under an explicit constraint between those two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed speech parameters usually causes muffled sounds. In order to alleviate the over-smoothing effect, we propose a generation algorithm considering not only the HMM likelihood maximized in the conventional algorithm but also a likelihood for a global variance (GV) of the generated trajectory. The latter likelihood works as a penalty for the over-smoothing, i.e., a reduction of the GV of the generated trajectory. The result of a perceptual evaluation demonstrates that the proposed algorithm causes considerably large improvements in the naturalness of synthetic speech.
引用
收藏
页码:816 / 824
页数:9
相关论文
共 50 条
  • [1] A speech parameter generation algorithm using local variance for HMM-based speech synthesis
    Chunwijitra, Vataya
    Nose, Takashi
    Kobayashi, Takao
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1150 - 1153
  • [2] A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis
    Nose, Takashi
    Chunwijitra, Vataya
    Kobayashi, Takao
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 221 - 228
  • [3] PARAMETER GENERATION ALGORITHM CONSIDERING MODULATION SPECTRUM FOR HMM-BASED SPEECH SYNTHESIS
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4210 - 4214
  • [4] TRAJECTORY TRAINING CONSIDERING GLOBAL VARIANCE FOR HMM-BASED SPEECH SYNTHESIS
    Toda, Tomoki
    Young, Steve
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4025 - +
  • [5] SPEECH PARAMETER GENERATION CONSIDERING LSP ORDERING PROPERTY FOR HMM-BASED SPEECH SYNTHESIS
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Zou, Ping
    Wang, Kai
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 330 - 334
  • [6] Speech parameter generation algorithms for HMM-based speech synthesis
    Tokuda, K
    Yoshimura, T
    Masuko, T
    Kobayashi, T
    Kitamura, T
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1315 - 1318
  • [7] Minimum generation error criterion considering global/local variance for HMM-based speech synthesis
    Wu, Yi-Jian
    Zen, Heiga
    Nankaku, Yoshilliko
    Tokuda, Keiichi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4621 - 4624
  • [8] Parameter Generation Considering LSP Ordering Property for HMM-Based Speech Synthesis
    Qian, Shijun
    Wang, Huanliang
    Pei, Wenjiang
    Wang, Kai
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (08) : 467 - 470
  • [9] AN OPTIMIZATION ALGORITHM OF INDEPENDENT MEAN AND VARIANCE PARAMETER TYING STRUCTURES FOR HMM-BASED SPEECH SYNTHESIS
    Takaki, Shinji
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4700 - 4703
  • [10] Global Variance Modeling on the Log Power Spectrum of LSPs for HMM-based Speech Synthesis
    Ling, Zhen-Hua
    Hu, Yu
    Dai, Li-Rong
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 825 - 828