TRAJECTORY TRAINING CONSIDERING GLOBAL VARIANCE FOR HMM-BASED SPEECH SYNTHESIS

被引:13
|
作者
Toda, Tomoki [1 ]
Young, Steve [2 ]
机构
[1] Nara Inst Sci & Technol NAIST, Grad Sch Informat Sci, Nara, Japan
[2] Univ Cambridge, Dept Engn, Cambridge CB2 1TN, England
关键词
speech synthesis; hidden Markov models; training criterion; trajectory likelihood; global variance;
D O I
10.1109/ICASSP.2009.4960511
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel method for training hidden Markov models (HMMs) for use in HMM-based speech synthesis. The primary goal of HMM parameter optimization is to ensure that parameters generated from the trained models exhibit similar properties to natural speech. In this paper, two major problems in conventional training are addressed: 1) the inconsistency between the training and synthesis optimization criterion; and 2) the over-smoothing caused by the statistical modeling process. The proposed method integrates the global variance (GV) criterion into a trajectory training method to give a unified framework for both training and synthesis which provides both a consistent optimization criterion and a closed form solution for parameter generation. The experimental results demonstrate that the proposed method yields a significant improvement in the naturalness of synthetic speech.
引用
收藏
页码:4025 / +
页数:2
相关论文
共 50 条
  • [1] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    Toda, Tomoki
    Tokuda, Keiichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05) : 816 - 824
  • [2] Minimum generation error criterion considering global/local variance for HMM-based speech synthesis
    Wu, Yi-Jian
    Zen, Heiga
    Nankaku, Yoshilliko
    Tokuda, Keiichi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4621 - 4624
  • [3] TRAJECTORY TRAINING CONSIDERING GLOBAL VARIANCE FOR SPEECH SYNTHESIS BASED ON NEURAL NETWORKS
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5600 - 5604
  • [4] GLOBAL VARIANCE MODELING ON FREQUENCY DOMAIN DELTA LSP FOR HMM-BASED SPEECH SYNTHESIS
    Pan, Shifeng
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    Tao, Jianhua
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4716 - 4719
  • [5] Global Variance Modeling on the Log Power Spectrum of LSPs for HMM-based Speech Synthesis
    Ling, Zhen-Hua
    Hu, Yu
    Dai, Li-Rong
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 825 - 828
  • [6] Modulation Spectrum-Constrained Trajectory Training Algorithm for HMM-Based Speech Synthesis
    Takamichi, Shinnosuke
    Toda, Tomoki
    Black, Alan W.
    Nakamura, Satoshi
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1206 - 1210
  • [7] Considering Global Variance of the Log Power Spectrum Derived from Mel-Cepstrum in HMM-based Parametric Speech Synthesis
    Yin, Xiang
    Ling, Zhen-Hua
    Lei, Ming
    Dai, Li-Rong
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1146 - 1149
  • [8] Integrating Global Variance of Log Power Spectrum Derived from LSPs into MGE Training for HMM-Based Parametric Speech Synthesis
    Sun, Yu-Sheng
    Ling, Zhen-Hua
    Yin, Xiang
    Dai, Li-Rong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 201 - 205
  • [9] Minimum generation error training for HMM-based speech synthesis
    Wu, Yi-Jian
    Wang, Ren-Hua
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 89 - 92
  • [10] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
    Shiga, Yoshinori
    Toda, Tomoki
    Sakai, Shinsuke
    Kawai, Hisashi
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812