Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique

被引:17
|
作者
Alam, Md Jahangir [1 ,2 ]
Kenny, Patrick [2 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ H3C 3P8, Canada
[2] CRIM, Montreal, PQ H3C 3P8, Canada
关键词
Speech recognition; Compressive gammachirp; Auditory spectrum enhancement; Feature normalization; SPEECH; NOISE; COMPENSATION; RECOGNITION; SUPPRESSION; ADAPTATION; MODEL;
D O I
10.1016/j.dsp.2014.03.001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. As a post processing scheme we employ a short-time feature normalization technique called short-time cepstral mean and scale normalization (STCMSN), which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 connected digit corpus, the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, and the AURORA-4 LVCSR corpus, which represent additive noise, reverberant acoustic conditions and additive noise as well as different microphone channel conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC), conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results demonstrate that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of the ETSI-AFE and PNCC on the AURORA-2 as well as AURORA-4 corpora and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus. (c) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 43 条
  • [1] Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1358 - 1361
  • [2] A NOVEL FEATURE EXTRACTOR EMPLOYING REGULARIZED MVDR SPECTRUM ESTIMATOR AND SUBBAND SPECTRUM ENHANCEMENT TECHNIQUE
    Alam, Md Jahangir
    O'Shaughnessy, Douglas
    Kenny, Patrick
    2013 8TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNAL PROCESSING AND THEIR APPLICATIONS (WOSSPA), 2013, : 342 - 346
  • [3] Combining speech enhancement and auditory feature extraction for robust speech recognition
    Kleinschmidt, M
    Tchorz, J
    Kollmeier, B
    SPEECH COMMUNICATION, 2001, 34 (1-2) : 75 - 91
  • [4] Robust speech feature extraction based on dynamic minimum subband spectral subtraction
    Ma, Xin
    Zhou, Weidong
    Ju, Fang
    INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 1056 - 1061
  • [5] Feature extraction based on auditory representations for robust speech recognition
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 15 - 16
  • [6] PS-ZCPA based feature extraction with auditory masking, modulation enhancement and noise reduction for robust ASR
    Ghulam, M
    Fukuda, T
    Katsurada, K
    Horikawa, J
    Nitta, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) : 1015 - 1023
  • [7] The Wavelet and Fourier Transforms in Feature Extraction for Text-Dependent, Filterbank-Based Speaker Recognition
    Turner, Claude
    Joseph, Anthony
    Aksu, Murat
    Langdon, Heather
    COMPLEX ADAPTIVE SYSTEMS, 2011, 6
  • [8] A robust feature extraction approach based on an auditory model for classification of speech and expressiveness
    孙颖
    V.Werner
    张雪英
    Journal of Central South University, 2012, 19 (02) : 504 - 510
  • [9] A robust feature extraction approach based on an auditory model for classification of speech and expressiveness
    Sun Ying
    Werner, V.
    Zhang Xue-ying
    JOURNAL OF CENTRAL SOUTH UNIVERSITY OF TECHNOLOGY, 2012, 19 (02): : 504 - 510
  • [10] A robust feature extraction approach based on an auditory model for classification of speech and expressiveness
    Ying Sun
    V. Werner
    Xue-ying Zhang
    Journal of Central South University, 2012, 19 : 504 - 510