Large scale discriminative training of hidden Markov models for speech recognition

被引：174

作者：

Woodland, PC ^{[1
]}

Povey, D ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England

来源：

COMPUTER SPEECH AND LANGUAGE | 2002年 / 16卷 / 01期

关键词：

D O I：

10.1006/csla.2001.0182

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). This paper concentrates on the maximum mutual information estimation (MMIE) criterion which has been used to train HMM systems for conversational telephone speech transcription using up to 265 hours of training data. These experiments represent the largest-scale application of discriminative training techniques for speech recognition of which the authors are aware. Details are given of the MMIE lattice-based implementation used with the extended Baum-Welch algorithm, which makes training of such large systems computationally feasible. Techniques for improving generalization using acoustic scaling and weakened language models are discussed. The overall technique has allowed the estimation of triphone and quinphone HMM parameters which has led to significant reductions in word error rate for the transcription of conversational telephone speech relative to our best systems trained using maximum likelihood estimation (MLE). This is in contrast to some previous studies, which have concluded that there is little benefit in using discriminative training for the most difficult large vocabulary speech recognition tasks. The lattice MMIE-based discriminative training scheme is also shown to out-perform the frame discrimination technique. Various properties of the lattice-based MMIE training scheme are investigated including comparisons of different lattice processing strategies (full search and exact-match) and the effect of lattice size on performance. Furthermore a scheme based on the linear interpolation of the MMIE and MLE objective functions is shown to reduce the danger of over-training. It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR). This has allowed the straightforward integration of MMIE-trained HMMs into complex multi-pass systems for transcription of conversational telephone speech and has contributed to our MMIE-trained systems giving the lowest word error rates in both the 2000 and 2001 NIST Hub5 evaluations. (C) 2002 Academic Press.

引用

页码：25 / 47

页数：23

共 50 条

[31] IMPROVED HIDDEN MARKOV-MODELS FOR SPEECH RECOGNITION
AUBERT, X
BOURLARD, H
KAMP, Y
WELLEKENS, CJ
[J]. PHILIPS JOURNAL OF RESEARCH, 1988, 43 (3-4) : 224 - 245
[32] REVISITING HIDDEN MARKOV MODELS FOR SPEECH EMOTION RECOGNITION
Mao, Shuiyang
Tao, Dehua
Zhang, Guangyan
Ching, P. C.
Lee, Tan
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6715 - 6719
[33] Fuzzy hidden Markov models for speech and speaker recognition
Tran, Dat
Wagner, Michael
[J]. Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS, 1999, : 426 - 430
[34] COMPETITIVE TRAINING - A CONNECTIONIST APPROACH TO THE DISCRIMINATIVE TRAINING OF HIDDEN MARKOV-MODELS
YOUNG, SJ
[J]. IEE PROCEEDINGS-I COMMUNICATIONS SPEECH AND VISION, 1991, 138 (01): : 61 - 68
[35] DISCRIMINATIVE SPECTRAL LEARNING OF HIDDEN MARKOV MODELS FOR HUMAN ACTIVITY RECOGNITION
Nazabal, Alfredo
Artes-Rodriguez, Antonio
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1966 - 1970
[36] Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition
Pylkkonen, Janne
Kurimo, Mikko
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1210 - 1213
[37] Discriminative training of hidden Markov models using a classification measure criterion
Chesta, C
Girardi, A
Laface, P
Nigra, M
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 449 - 452
[38] LARGE VOCABULARY HIDDEN MARKOV MODEL BASED SPEECH RECOGNITION
RIGOLL, G
[J]. EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 1990, 1 (01): : 37 - 42
[39] Adaptation scheme for hidden Markov models in noisy speech recognition
Hwang, TH
Wang, HC
[J]. ELECTRONICS LETTERS, 1997, 33 (04) : 257 - 258
[40] Incorporating phonetic properties in hidden Markov models for speech recognition
Sitaram, RNV
Sreenivas, T
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (02): : 1149 - 1158

← 1 2 3 4 5 →