A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

被引:16
|
作者
Xiao, Xiong [1 ]
Li, Jinyu [2 ]
Chng, Eng Siong [1 ]
Li, Haizhou [1 ,3 ]
Lee, Chin-Hui [4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Microsoft Corp, Redmond, WA 98052 USA
[3] Inst Infocomm Res, Singapore 138632, Singapore
[4] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
Aurora task; discriminative training; large margin; robust speech recognition; HISTOGRAM EQUALIZATION; NORMALIZATION; COMPENSATION; ENHANCEMENT; DOMAIN; ENVIRONMENT; FEATURES; SPECTRA;
D O I
10.1109/TASL.2009.2031236
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we explore the generalization capability of acoustic model for improving speech recognition robustness against noise distortions. While generalization in statistical learning theory originally refers to the model's ability to generalize well on unseen testing data drawn from the same distribution as that of the training data, we show that good generalization capability is also desirable for mismatched cases. One way to obtain such general models is to use margin-based model training method, e. g., soft-margin estimation (SME), to enable some tolerance to acoustic mismatches without a detailed knowledge about the distortion mechanisms through enhancing margins between competing models. Experimental results on the Aurora-2 and Aurora-3 connected digit string recognition tasks demonstrate that, by improving the model's generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints. Recognition results show that SME indeed performs better with than without mean and variance normalization, and therefore provides a complimentary benefit to conventional feature normalization techniques such that they can be combined to further improve the system performance. Although this study is focused on noisy speech recognition, we believe the proposed margin-based learning framework can be extended to dealing with different types of distortions and robustness issues in other machine learning applications.
引用
收藏
页码:1158 / 1169
页数:12
相关论文
共 50 条
  • [1] A Study on Hidden Markov Model's Generalization Capability for Speech Recognition
    Xiao, Xiong
    Li, Jinyu
    Chng, Eng Siong
    Li, Haizhou
    Lee, Chin-Hui
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 118 - +
  • [2] Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models
    Hoesen, Devin
    Satriawan, Cil Hardianto
    Lestari, Dessi Puji
    Khodra, Masayu Leylia
    [J]. SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 167 - 173
  • [3] Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition
    Kang, Byung Ok
    Kwon, Oh-Wook
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (03): : 724 - 730
  • [4] DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
    Ghorbani, Shahram
    Khorram, Soheil
    Hansen, John H. L.
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 107 - 113
  • [5] Gated Recurrent Units Based Hybrid Acoustic Models for Robust Speech Recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Jia
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [6] Interpolation of Acoustic Models for Speech Recognition
    Fraga-Silva, Thiago
    Gauvain, Jean-Luc
    Lamel, Lori
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3346 - 3350
  • [7] Acoustic feature combination for robust speech recognition
    Zolnay, A
    Schlüter, R
    Ney, H
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 457 - 460
  • [8] COMBINING SPEECH RECOGNITION AND ACOUSTIC WORD EMOTION MODELS FOR ROBUST TEXT-INDEPENDENT EMOTION RECOGNITION
    Schuller, Bjoern
    Vlasenko, Bogdan
    Arsic, Dejan
    Rigoll, Gerhard
    Wendemuth, Andreas
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1333 - +
  • [9] Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition
    Pylkkonen, Janne
    Kurimo, Mikko
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1210 - 1213
  • [10] GENERALIZATION OF TEMPORAL FILTER AND LINEAR TRANSFORMATION FOR ROBUST SPEECH RECOGNITION
    Duc Hoang Ha Nguyen
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,