A Study on Hidden Markov Model's Generalization Capability for Speech Recognition

被引:0
|
作者
Xiao, Xiong [1 ]
Li, Jinyu [2 ]
Chng, Eng Siong [1 ]
Li, Haizhou [3 ]
Lee, Chin-Hui [4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Microsoft Corp, Redmond, WA 98052 USA
[3] Inst Infocomm Res, Singapore 138632, Singapore
[4] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
model generalization; robustness; soft margin estimation; minimum classification error; Aurora task;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From statistical learning theory, the generalization capability of a model is the ability to generalize well on unseen test data which follow the same distribution as the training data. This paper investigates how generalization capability can also improve robustness when testing and training data are from different distributions in the context of speech recognition. Two discriminative training (DT) methods are used to train the hidden Markov model (HMM) for better generalization capability, namely the minimum classification error (MCE) and the soft-margin estimation (SME) methods. Results on Aurora-2 task show that both SME and MCE are effective in improving one of the measures of acoustic model's generalization capability, i.e. the margin of the model, with SME be moderately more effective. In addition, the better generalization capability translates into better robustness of speech recognition performance, even when there is significant mismatch between the training and testing data. We also applied the mean and variance normalization (MVN) to preprocess the data to reduce the training-testing mismatch. After MVN, MCE and SME perform even better as the generalization capability now is more closely related to robustness. The best performance on Aurora-2 is obtained from SME and about 28% relative error rate reduction is achieved over the MVN baseline system. Finally, we also use SME to demonstrate the potential of better generalization capability in improving robustness in more realistic noisy task using the Aurora-3 task, and significant improvements are obtained.
引用
收藏
页码:118 / +
页数:2
相关论文
共 50 条
  • [31] Speech recognition using hybrid hidden Markov model and NN classifier
    Kundu A.
    Bayya A.
    [J]. International Journal of Speech Technology, 1998, 2 (3) : 227 - 240
  • [32] BAYESIAN ADAPTIVE LEARNING OF THE PARAMETERS OF HIDDEN MARKOV MODEL FOR SPEECH RECOGNITION
    HUO, Q
    CHAN, C
    LEE, CH
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (05): : 334 - 345
  • [33] Speech recognition algorithm based on neural network and hidden Markov model
    Zhao Jianhui
    Gao Hongbo
    Liu Yuchao
    Cheng Bo
    [J]. The Journal of China Universities of Posts and Telecommunications, 2018, 25 (04) : 28 - 37
  • [34] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
    Chien, JT
    Wang, HC
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1997, 144 (03): : 129 - 135
  • [35] A PARALLEL IMPLEMENTATION OF A HIDDEN MARKOV MODEL WITH DURATION MODELING FOR SPEECH RECOGNITION
    MITCHELL, CD
    HARPER, MP
    JAMIESON, LH
    HELZERMAN, RA
    [J]. DIGITAL SIGNAL PROCESSING, 1995, 5 (01) : 43 - 57
  • [36] Noisy Hidden Markov Models for Speech Recognition
    Audhkhasi, Kartik
    Osoba, Osonde
    Kosko, Bart
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [37] The Application of Hidden Markov Models in Speech Recognition
    Gales, Mark
    Young, Steve
    [J]. FOUNDATIONS AND TRENDS IN SIGNAL PROCESSING, 2007, 1 (03): : 195 - 304
  • [38] Hidden Markov models for speech and signal recognition
    Rose, RC
    Juang, BH
    [J]. CONTINUOUS WAVE-FORM ANALYSIS, 1996, (45): : 137 - 152
  • [39] HIDDEN MARKOV-MODELS FOR SPEECH RECOGNITION
    JUANG, BH
    RABINER, LR
    [J]. TECHNOMETRICS, 1991, 33 (03) : 251 - 272
  • [40] Time-Inhomogeneous Hidden Bernoulli Model: An alternative to Hidden Markov Model for automatic speech recognition
    Kabudian, Jahanshah
    Homayounpour, M. Mehdi
    Ahadi, S. Mohammad
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4101 - +