Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments

被引:36
|
作者
Kim, HK [1 ]
Rose, RC
机构
[1] Kwangju Inst Sci & Technol, Dept Informat & Commun, Kwangju 500712, South Korea
[2] AT&T Labs Res, Florham Pk, NJ 07932 USA
来源
关键词
acoustic feature compensation; cepstrum compensation; noise-robust front-end; speech enhancement; speech recognition;
D O I
10.1109/TSA.2003.815515
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a set of acoustic feature pre-processing techniques that are applied to improving automatic speech recognition (ASR) performance on noisy speech recognition tasks. The principal contribution of this paper is an approach for cepstrum-domain feature compensation in ASR which is motivated by techniques for decomposing speech and noise that were originally developed for noisy speech enhancement. This approach is applied in combination with other feature compensation algorithms to compensating ASR features obtained from a mel-filterbank cepstrum coefficient front-end. Performance comparisons are made with respect to the application of the minimum mean squared error log spectral amplitude (MMSE-LSA) estimator based speech enhancement algorithm prior to feature analysis. An experimental study is presented where the feature compensation approaches described in the paper are found to greatly reduce ASR word error rate compared to uncompensated features under environmental and channel mismatched conditions.
引用
收藏
页码:435 / 446
页数:12
相关论文
共 44 条
  • [21] Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech
    Uluskan S.
    Sangwan A.
    Hansen J.H.L.
    International Journal of Speech Technology, 2017, 20 (4) : 799 - 811
  • [22] Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation
    Tsao, Yu
    Lu, Xugang
    Dixon, Paul
    Hu, Ting-yao
    Matsuda, Shigeki
    Hori, Chiori
    COMPUTER SPEECH AND LANGUAGE, 2014, 28 (03): : 709 - 726
  • [23] A VTS-BASED FEATURE COMPENSATION APPROACH TO NOISY SPEECH RECOGNITION USING MIXTURE MODELS OF DISTORTION
    Du, Jun
    Huo, Qiang
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7078 - 7082
  • [24] Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments
    Kim, DS
    Jeong, JH
    Kim, JW
    Lee, SY
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 61 - 64
  • [25] Online feature compensation using modified quantile based noise estimation for robust speech recognition
    Lee, Heungkyu
    Kwon, Ohil
    Kim, June
    ADVANCES IN INTELLIGENT IT: ACTIVE MEDIA TECHNOLOGY 2006, 2006, 138 : 236 - 242
  • [26] Voice Activity Detection Method Using Psycho Acoustic Model Based on Speech Energy Maximization in Noisy Environments
    Choi, Gab-Keun
    Kim, Soon-Hyob
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (05): : 447 - 453
  • [27] A Multichannel Noise Reduction Front-end based on psychoacoustics for robust speech recognition in highly noisy environments
    Cifani, Simone
    Principi, Emanuele
    Rocchi, Cesare
    Squartini, Stefano
    Piazza, Francesco
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 173 - 176
  • [28] Noise robust speech recognition using feature compensation based on polynomial fly regression of utterance SNR
    Cui, XD
    Alwan, A
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (06): : 1161 - 1172
  • [29] IVN-Based Joint Training Of GMM And HMMs Using An Improved VTS-Based Feature Compensation For Noisy Speech Recognition
    Du, Jun
    Huo, Qiang
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1226 - 1229
  • [30] Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments
    Boonkla, Surasak
    Unoki, Masashi
    Makhanov, Stanislav S.
    SPEECH AND COMPUTER, 2016, 9811 : 580 - 587