A Study on Hidden Markov Model's Generalization Capability for Speech Recognition

被引：0

作者：

Xiao, Xiong ^{[1
]}

Li, Jinyu ^{[2
]}

Chng, Eng Siong ^{[1
]}

Li, Haizhou ^{[3
]}

Lee, Chin-Hui ^{[4
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore

[2] Microsoft Corp, Redmond, WA 98052 USA

[3] Inst Infocomm Res, Singapore 138632, Singapore

[4] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

来源：

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009) | 2009年

关键词：

model generalization; robustness; soft margin estimation; minimum classification error; Aurora task;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

From statistical learning theory, the generalization capability of a model is the ability to generalize well on unseen test data which follow the same distribution as the training data. This paper investigates how generalization capability can also improve robustness when testing and training data are from different distributions in the context of speech recognition. Two discriminative training (DT) methods are used to train the hidden Markov model (HMM) for better generalization capability, namely the minimum classification error (MCE) and the soft-margin estimation (SME) methods. Results on Aurora-2 task show that both SME and MCE are effective in improving one of the measures of acoustic model's generalization capability, i.e. the margin of the model, with SME be moderately more effective. In addition, the better generalization capability translates into better robustness of speech recognition performance, even when there is significant mismatch between the training and testing data. We also applied the mean and variance normalization (MVN) to preprocess the data to reduce the training-testing mismatch. After MVN, MCE and SME perform even better as the generalization capability now is more closely related to robustness. The best performance on Aurora-2 is obtained from SME and about 28% relative error rate reduction is achieved over the MVN baseline system. Finally, we also use SME to demonstrate the potential of better generalization capability in improving robustness in more realistic noisy task using the Aurora-3 task, and significant improvements are obtained.

引用

页码：118 / +

页数：2

共 50 条

[31] Speech recognition using hybrid hidden Markov model and NN classifier
Kundu A.
Bayya A.
[J]. International Journal of Speech Technology, 1998, 2 (3) : 227 - 240
[32] BAYESIAN ADAPTIVE LEARNING OF THE PARAMETERS OF HIDDEN MARKOV MODEL FOR SPEECH RECOGNITION
HUO, Q
CHAN, C
LEE, CH
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05): : 334 - 345
[33] Speech recognition algorithm based on neural network and hidden Markov model
Zhao Jianhui
Gao Hongbo
Liu Yuchao
Cheng Bo
[J]. The Journal of China Universities of Posts and Telecommunications, 2018, 25 (04) : 28 - 37
[34] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
Chien, JT
Wang, HC
[J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1997, 144 (03): : 129 - 135
[35] A PARALLEL IMPLEMENTATION OF A HIDDEN MARKOV MODEL WITH DURATION MODELING FOR SPEECH RECOGNITION
MITCHELL, CD
HARPER, MP
JAMIESON, LH
HELZERMAN, RA
[J]. DIGITAL SIGNAL PROCESSING, 1995, 5 (01) : 43 - 57
[36] Noisy Hidden Markov Models for Speech Recognition
Audhkhasi, Kartik
Osoba, Osonde
Kosko, Bart
[J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[37] The Application of Hidden Markov Models in Speech Recognition
Gales, Mark
Young, Steve
[J]. FOUNDATIONS AND TRENDS IN SIGNAL PROCESSING, 2007, 1 (03): : 195 - 304
[38] Hidden Markov models for speech and signal recognition
Rose, RC
Juang, BH
[J]. CONTINUOUS WAVE-FORM ANALYSIS, 1996, (45): : 137 - 152
[39] HIDDEN MARKOV-MODELS FOR SPEECH RECOGNITION
JUANG, BH
RABINER, LR
[J]. TECHNOMETRICS, 1991, 33 (03) : 251 - 272
[40] Time-Inhomogeneous Hidden Bernoulli Model: An alternative to Hidden Markov Model for automatic speech recognition
Kabudian, Jahanshah
Homayounpour, M. Mehdi
Ahadi, S. Mohammad
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4101 - +

← 1 2 3 4 5 →