A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition

被引:16
|
作者
Xiao, Xiong [1 ]
Li, Jinyu [2 ]
Chng, Eng Siong [1 ]
Li, Haizhou [1 ,3 ]
Lee, Chin-Hui [4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Microsoft Corp, Redmond, WA 98052 USA
[3] Inst Infocomm Res, Singapore 138632, Singapore
[4] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
Aurora task; discriminative training; large margin; robust speech recognition; HISTOGRAM EQUALIZATION; NORMALIZATION; COMPENSATION; ENHANCEMENT; DOMAIN; ENVIRONMENT; FEATURES; SPECTRA;
D O I
10.1109/TASL.2009.2031236
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we explore the generalization capability of acoustic model for improving speech recognition robustness against noise distortions. While generalization in statistical learning theory originally refers to the model's ability to generalize well on unseen testing data drawn from the same distribution as that of the training data, we show that good generalization capability is also desirable for mismatched cases. One way to obtain such general models is to use margin-based model training method, e. g., soft-margin estimation (SME), to enable some tolerance to acoustic mismatches without a detailed knowledge about the distortion mechanisms through enhancing margins between competing models. Experimental results on the Aurora-2 and Aurora-3 connected digit string recognition tasks demonstrate that, by improving the model's generalization capability through SME training, speech recognition performance can be significantly improved in both matched and low to medium mismatched testing cases with no language model constraints. Recognition results show that SME indeed performs better with than without mean and variance normalization, and therefore provides a complimentary benefit to conventional feature normalization techniques such that they can be combined to further improve the system performance. Although this study is focused on noisy speech recognition, we believe the proposed margin-based learning framework can be extended to dealing with different types of distortions and robustness issues in other machine learning applications.
引用
收藏
页码:1158 / 1169
页数:12
相关论文
共 50 条
  • [31] ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION
    Narayanan, Arun
    Zhao, Xiaojia
    Wang, DeLiang
    Fosler-Lussier, Eric
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4800 - 4803
  • [32] EFFICIENT TRAINING OF ACOUSTIC MODELS FOR REVERBERATION-ROBUST MEDIUM-VOCABULARY AUTOMATIC SPEECH RECOGNITION
    Sehr, Armin
    Barfuss, Hendrik
    Hofmann, Christian
    Maas, Roland
    Kellermann, Walter
    [J]. 2014 4TH JOINT WORKSHOP ON HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS (HSCMA), 2014, : 177 - 181
  • [33] Robust speech recognition using probabilistic union models
    Ming, J
    Jancovic, P
    Smith, FJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (06): : 403 - 414
  • [34] ROBUST SPEECH RECOGNITION USING MULTIVARIATE COPULA MODELS
    Bayestehtashk, Alireza
    Shafran, Izhak
    Babaeian, Amir
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5890 - 5894
  • [35] Joint Training of Speech Separation, Filterbank and Acoustic Model for Robust Automatic Speech Recognition
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2839 - 2843
  • [36] A STUDY ON DATA AUGMENTATION OF REVERBERANT SPEECH FOR ROBUST SPEECH RECOGNITION
    Ko, Tom
    Peddinti, Vijayaditya
    Povey, Daniel
    Seltzer, Michael L.
    Khudanpur, Sanjeev
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5220 - 5224
  • [37] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    [J]. 2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
  • [38] Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech
    Sahidullah, Md
    Hautamaki, Rosa Gonzalez
    Thomsen, Dennis Alexander Lehmann
    Kinntinenl, Tomi
    Tang, Zheng-Hua
    Hautamaki, Ville
    Parts, Robert
    Pitkanen, Martti
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1720 - 1724
  • [39] FUSION OF STANDARD AND ALTERNATIVE ACOUSTIC SENSORS FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Heracleous, Panikos
    Even, Jani
    Ishi, Carlos T.
    Miyashita, Takahiro
    Hagita, Norihiro
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4837 - 4840
  • [40] Roles of high-fidelity acoustic modeling in robust speech recognition
    Deng, Li
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 1 - 13