Minimum segmentation error based discriminative training for speech synthesis application

被引:0
|
作者
Wu, YJ
Kawai, H
Ni, JF
Wang, RH
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the conventional HMM-based segmentation method, the HMM training is based on MLE criteria, which links the segmentation task to the problem of distribution estimation. The HMMs are built to identify the phonetic segments, not to detect the boundary. This kind of inconsistency between training and application limited the performance of segmentation. In this paper, we adopt the discriminative training method and introduce a new criterion, named Minimum Segmentation Error (MSGE), for HMM training. In this method, a loss function directly related to the segmentation error is defined. By minimizing the overall empirical loss with the Generalized Probabilistic Descent (GPD) algorithm, the segmentation error is also minimized. From the results on both Chinese and Japanese data, the accuracy of segmentation is improved. Moreover, this method is robust even when we do not have enough knowledge on HMM modeling, e.g. the number of states is not optimized.
引用
收藏
页码:629 / 632
页数:4
相关论文
共 50 条
  • [1] Minimum generation error training for HMM-based speech synthesis
    Wu, Yi-Jian
    Wang, Ren-Hua
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 89 - 92
  • [2] Discriminative pronunciation modeling based on minimum phone error training
    Song, Meixu
    Zhang, Qingqing
    Pan, Jielin
    Yan, Yonghong
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1940 - 1944
  • [3] Optimized discriminative transformations for speech features based on minimum classification error
    Zamani, Behzad
    Akbari, Ahmad
    Nasersharif, Babak
    Jalalvand, Azarakhsh
    PATTERN RECOGNITION LETTERS, 2011, 32 (07) : 948 - 955
  • [4] Towards minimum perceptual error training for DNN-based speech synthesis
    Valentini-Botinhao, Cassia
    Wu, Zhizheng
    King, Simon
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 869 - 873
  • [5] Discriminative training for large-vocabulary speech recognition using minimum classification error
    McDermott, Erik
    Hazen, Timothy J.
    Le Roux, Jonathan
    Nakamura, Atsushi
    Katagiri, Shigeru
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 203 - 223
  • [6] Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition
    Doumpiotis, V
    Byrne, W
    SPEECH COMMUNICATION, 2006, 48 (02) : 142 - 160
  • [7] PROTOTYPE-BASED MINIMUM ERROR TRAINING FOR SPEECH RECOGNITION
    MCDERMOTT, E
    KATAGIRI, S
    APPLIED INTELLIGENCE, 1994, 4 (03) : 245 - 256
  • [8] A STUDY ON MINIMUM ERROR DISCRIMINATIVE TRAINING FOR SPEAKER RECOGNITION
    LIU, CS
    LEE, CH
    CHOU, W
    JUANG, BH
    ROSENBERG, AE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1995, 97 (01): : 637 - 648
  • [9] Discriminative training for concatenative speech synthesis
    Kim, NS
    Park, SS
    IEEE SIGNAL PROCESSING LETTERS, 2004, 11 (01) : 40 - 43
  • [10] INCORPORATING DYNAMIC FEATURES INTO MINIMUM GENERATION ERROR TRAINING FOR HMM-BASED SPEECH SYNTHESIS
    Ninh, Duy Khanh
    Morise, Masanori
    Yamashita, Yoichi
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 55 - 59