Minimum segmentation error based discriminative training for speech synthesis application

被引:0
|
作者
Wu, YJ
Kawai, H
Ni, JF
Wang, RH
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the conventional HMM-based segmentation method, the HMM training is based on MLE criteria, which links the segmentation task to the problem of distribution estimation. The HMMs are built to identify the phonetic segments, not to detect the boundary. This kind of inconsistency between training and application limited the performance of segmentation. In this paper, we adopt the discriminative training method and introduce a new criterion, named Minimum Segmentation Error (MSGE), for HMM training. In this method, a loss function directly related to the segmentation error is defined. By minimizing the overall empirical loss with the Generalized Probabilistic Descent (GPD) algorithm, the segmentation error is also minimized. From the results on both Chinese and Japanese data, the accuracy of segmentation is improved. Moreover, this method is robust even when we do not have enough knowledge on HMM modeling, e.g. the number of states is not optimized.
引用
收藏
页码:629 / 632
页数:4
相关论文
共 50 条
  • [21] PRESERVE ORDERING PROPERTY OF GENERATED LSPS FOR MINIMUM GENERATION ERROR TRAINING IN HMM-BASED SPEECH SYNTHESIS
    Lei, Ming
    Ling, Zhen-Hua
    Dai, Li-Rong
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4712 - 4715
  • [22] Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
    Wu, Yi-Jian
    Tokuda, Keiichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 577 - 580
  • [23] Discriminative utterance verification using minimum string verification error (MSVE) training
    Rahim, MG
    Lee, CH
    Juang, BH
    Chou, W
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3585 - 3588
  • [24] A STUDY ON SPEAKER ADAPTATION FOR MANDARINE SYLLABLE RECOGNITION WITH MINIMUM ERROR DISCRIMINATIVE TRAINING
    LIN, CH
    WU, CH
    CHANG, PC
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 712 - 718
  • [25] Minimum generation error based optimization of HMM model clustering for speech synthesis
    Lu, Heng
    Ling, Zhen-Hua
    Lei, Ming
    Dai, Li-Rong
    Wang, Ren-Hua
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (06): : 822 - 828
  • [26] DISCRIMINATIVE LEARNING FOR MINIMUM ERROR CLASSIFICATION
    JUANG, BH
    KATAGIRI, S
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1992, 40 (12) : 3043 - 3054
  • [27] Discriminative learning for minimum error and minimum reject classification
    Mizutani, H
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 136 - 140
  • [28] Discriminative Training of Dynamic Programming Based Speech Recognizers
    Chang, Pao-Chung
    Juang, Biing-Hwang
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 135 - 143
  • [29] Speech Synthesis for Error Training Models in CALL
    Zhang, Xin
    Lu, Qin
    Wan, Jiping
    Ma, Guangguang
    Chiu, Tin Shing
    Ye, Weiping
    Zhou, Wenli
    Li, Qiao
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES: LANGUAGE TECHNOLOGY FOR THE KNOWLEDGE-BASED ECONOMY, 2009, 5459 : 260 - +
  • [30] Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training
    Wu, Zhizheng
    King, Simon
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1255 - 1265