Robust Feature Extraction Using Modulation Filtering of Autoregressive Models

被引:39
|
作者
Ganapathy, Sriram [1 ]
Mallidi, Sri Harish [2 ]
Hermansky, Hynek [2 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
Autoregressive modeling; feature extraction; modulation filtering; speaker and language recognition; FRONT-END; SPEECH; RECOGNITION;
D O I
10.1109/TASLP.2014.2329190
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker and language recognition in noisy and degraded channel conditions continue to be a challenging problem mainly due to the mismatch between clean training and noisy test conditions. In the presence of noise, the most reliable portions of the signal are the high energy regions which can be used for robust feature extraction. In this paper, we propose a front end processing scheme based on autoregressive (AR) models that represent the high energy regions with good accuracy followed by a modulation filtering process. The AR model of the spectrogram is derived using two separable time and frequency AR transforms. The first AR model (temporal AR model) of the sub-band Hilbert envelopes is derived using frequency domain linear prediction (FDLP). This is followed by a spectral AR model applied on the FDLP envelopes. The output 2-D AR model represents a low-pass modulation filtered spectrogram of the speech signal. The band-pass modulation filtered spectrograms can further be derived by dividing two AR models with different model orders (cut-off frequencies). The modulation filtered spectrograms are converted to cepstral coefficients and are used for a speaker recognition task in noisy and reverberant conditions. Various speaker recognition experiments are performed with clean and noisy versions of the NIST-2010 speaker recognition evaluation (SRE) database using the state-of-the-art speaker recognition system. In these experiments, the proposed front-end analysis provides substantial improvements (relative improvements of up to 25%) compared to baseline techniques. Furthermore, we also illustrate the generalizability of the proposed methods using language identification (LID) experiments on highly degraded high-frequency (HF) radio channels and speech recognition experiments on noisy data.
引用
收藏
页码:1285 / 1295
页数:11
相关论文
共 50 条
  • [21] Speech feature extraction based on wavelet modulation scale for robust speech recognition
    Ma, Xin
    Zhou, Weidong
    Ju, Fang
    Jiang, Qi
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
  • [22] Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models
    Mallidi, Sri Harish
    Ganapathy, Sriram
    Hermansky, Hynek
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3656 - 3660
  • [23] Feature Extraction of Brain-Computer Interface based on Improved Multivariate Adaptive Autoregressive Models
    Wang, Jiang
    Xu, Guizhi
    Wang, Lei
    Zhang, Huiyuan
    2010 3RD INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2010), VOLS 1-7, 2010, : 895 - 898
  • [24] Feature extraction and classification of EEC during mental tasks based on fast multivariate autoregressive models
    Xue, Jianzhong
    Zheng, Chongxun
    Yan, Xiangguo
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2003, 37 (08): : 861 - 864
  • [25] MULTIVARIATE AUTOREGRESSIVE FEATURE EXTRACTION AND THE RECOGNITION OF MULTICHANNEL WAVEFORMS
    TJOSTHEIM, D
    SANDVIN, O
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (01) : 80 - 86
  • [26] Robust feature extraction using multiresolution local pattern information
    Liu, Zhuo
    Wada, Shigeo
    2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 355 - +
  • [27] Robust Character Recognition Using Adaptive Feature Extraction Method
    Mori, Minoru
    Sawaki, Minako
    Yamato, Junji
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (01): : 125 - 133
  • [28] Noise robust speech parameterization using multiresolution feature extraction
    Hariharan, R
    Kiss, I
    Viikki, O
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08): : 856 - 865
  • [29] A robust texture feature extraction using the localized angular phase
    Khairul Muzzammil Saipullah
    Deok-Hwan Kim
    Multimedia Tools and Applications, 2012, 59 : 717 - 747
  • [30] Robust feature extraction using subband spectral centroid histograms
    Gajic, B
    Paliwal, KK
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 85 - 88