Large scale discriminative training of hidden Markov models for speech recognition

被引:174
|
作者
Woodland, PC [1 ]
Povey, D [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
来源
COMPUTER SPEECH AND LANGUAGE | 2002年 / 16卷 / 01期
关键词
D O I
10.1006/csla.2001.0182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). This paper concentrates on the maximum mutual information estimation (MMIE) criterion which has been used to train HMM systems for conversational telephone speech transcription using up to 265 hours of training data. These experiments represent the largest-scale application of discriminative training techniques for speech recognition of which the authors are aware. Details are given of the MMIE lattice-based implementation used with the extended Baum-Welch algorithm, which makes training of such large systems computationally feasible. Techniques for improving generalization using acoustic scaling and weakened language models are discussed. The overall technique has allowed the estimation of triphone and quinphone HMM parameters which has led to significant reductions in word error rate for the transcription of conversational telephone speech relative to our best systems trained using maximum likelihood estimation (MLE). This is in contrast to some previous studies, which have concluded that there is little benefit in using discriminative training for the most difficult large vocabulary speech recognition tasks. The lattice MMIE-based discriminative training scheme is also shown to out-perform the frame discrimination technique. Various properties of the lattice-based MMIE training scheme are investigated including comparisons of different lattice processing strategies (full search and exact-match) and the effect of lattice size on performance. Furthermore a scheme based on the linear interpolation of the MMIE and MLE objective functions is shown to reduce the danger of over-training. It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR). This has allowed the straightforward integration of MMIE-trained HMMs into complex multi-pass systems for transcription of conversational telephone speech and has contributed to our MMIE-trained systems giving the lowest word error rates in both the 2000 and 2001 NIST Hub5 evaluations. (C) 2002 Academic Press.
引用
收藏
页码:25 / 47
页数:23
相关论文
共 50 条
  • [1] Large-margin discriminative training of hidden Markov models for speech recognition
    Yu, Dong
    Deng, Li
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 429 - +
  • [2] Discriminative training of hidden Markov models by multiobjective optimization for visual speech recognition
    Lee, JS
    Park, CH
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2053 - 2058
  • [3] Large margin hidden Markov models for speech recognition
    Jiang, Hui
    Li, Xinwei
    Liu, Chaojun
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1584 - 1595
  • [4] Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models
    Sha, Fei
    Saul, Lawrence K.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 313 - +
  • [5] BAYESIAN LARGE MARGIN HIDDEN MARKOV MODELS FOR SPEECH RECOGNITION
    Chen, Jung-Chun
    Chien, Jen-Tzung
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3765 - 3768
  • [6] OVERVIEW OF LARGE SCALE OPTIMIZATION FOR DISCRIMINATIVE TRAINING IN SPEECH RECOGNITION
    Kanevsky, Dimitri
    Heigold, Georg
    Wright, Stephen
    Ney, Hermann
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5233 - 5236
  • [7] A discriminative training algorithm for hidden Markov models
    Ben-Yishai, A
    Burshtein, D
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (03): : 204 - 217
  • [8] HIDDEN MARKOV MODELS IN SPEECH RECOGNITION
    Krajcovic, J.
    Hrncar, M.
    Muzikarova, E.
    [J]. ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2008, 7 (1-2) : 250 - 252
  • [9] A new look at discriminative training for hidden Markov models
    He, Xiaodong
    Deng, Li
    [J]. PATTERN RECOGNITION LETTERS, 2007, 28 (11) : 1285 - 1294
  • [10] DISCRIMINATIVE TRAINING FOR BAYESIAN SENSING HIDDEN MARKOV MODELS
    Saon, George
    Chien, Jen-Tzung
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5316 - 5319