Minimum segmentation error based discriminative training for speech synthesis application

被引:0
|
作者
Wu, YJ
Kawai, H
Ni, JF
Wang, RH
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the conventional HMM-based segmentation method, the HMM training is based on MLE criteria, which links the segmentation task to the problem of distribution estimation. The HMMs are built to identify the phonetic segments, not to detect the boundary. This kind of inconsistency between training and application limited the performance of segmentation. In this paper, we adopt the discriminative training method and introduce a new criterion, named Minimum Segmentation Error (MSGE), for HMM training. In this method, a loss function directly related to the segmentation error is defined. By minimizing the overall empirical loss with the Generalized Probabilistic Descent (GPD) algorithm, the segmentation error is also minimized. From the results on both Chinese and Japanese data, the accuracy of segmentation is improved. Moreover, this method is robust even when we do not have enough knowledge on HMM modeling, e.g. the number of states is not optimized.
引用
收藏
页码:629 / 632
页数:4
相关论文
共 50 条
  • [41] MINIMUM PHONE ERROR BASED STREAM WEIGHT TRAINING FOR MANDARIN AUDIO-VISUAL SPEECH RECOGNITION
    Wu, Guanyong
    Zhu, Jie
    Xu, Haihua
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 902 - 905
  • [42] Discriminative Training for Automatic Speech Recognition
    Heigold, Georg
    Ney, Hermann
    Schlueter, Ralf
    Wiesler, Simon
    IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 58 - 69
  • [43] Model Adaptation for HMM-Based Speech Synthesis under Minimum Generation Error Criterion
    Qin, Long
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Wang, Ren-Hua
    ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 539 - +
  • [44] Audio-visual speech recognition using minimum classification error training
    Miyajima, C
    Tokuda, K
    Kitamura, T
    NEURAL NETWORKS FOR SIGNAL PROCESSING X, VOLS 1 AND 2, PROCEEDINGS, 2000, : 3 - 12
  • [45] An application of minimum classification error to feature space transformations for speech recognition
    delaTorre, A
    Peinado, AM
    Rubio, AJ
    Sanchez, VE
    Diaz, JE
    SPEECH COMMUNICATION, 1996, 20 (3-4) : 273 - 290
  • [46] Audio-visual speech recognition using minimum classification error training
    Miyajima, Chiyomi
    Tokuda, Keiichi
    Kitamura, Tadashi
    Neural Networks for Signal Processing - Proceedings of the IEEE Workshop, 2000, 1 : 3 - 12
  • [47] Adaptive system training based on minimum error entropy
    Wang, Y
    Guo, WG
    Guo, HW
    2003 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS, INTELLIGENT SYSTEMS AND SIGNAL PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2003, : 1245 - 1249
  • [48] Speech Inventory Based Discriminative Training for Joint Speech Enhancement and Low-Rate Speech Coding
    Xiao, Xiaoqiang
    Nickel, Robert M.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2398 - +
  • [49] Local minimum generation error criterion for hybrid HMM speech synthesis
    Gonzalvo, Xavi
    Gutkin, Alexander
    Claudi Socoro, Joan
    Iriondo, Ignasi
    Taylor, Paul
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 404 - +
  • [50] Discriminative training based on minimum classification error for a small amount of data enhanced by vector-field-smoothed Bayesian learning
    Takahashi, J
    Sagayama, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (12) : 1700 - 1707