Robust Feature Extraction Using Modulation Filtering of Autoregressive Models

被引：39

作者：

Ganapathy, Sriram ^{[1
]}

Mallidi, Sri Harish ^{[2
]}

Hermansky, Hynek ^{[2
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 08期

关键词：

Autoregressive modeling; feature extraction; modulation filtering; speaker and language recognition; FRONT-END; SPEECH; RECOGNITION;

D O I：

10.1109/TASLP.2014.2329190

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker and language recognition in noisy and degraded channel conditions continue to be a challenging problem mainly due to the mismatch between clean training and noisy test conditions. In the presence of noise, the most reliable portions of the signal are the high energy regions which can be used for robust feature extraction. In this paper, we propose a front end processing scheme based on autoregressive (AR) models that represent the high energy regions with good accuracy followed by a modulation filtering process. The AR model of the spectrogram is derived using two separable time and frequency AR transforms. The first AR model (temporal AR model) of the sub-band Hilbert envelopes is derived using frequency domain linear prediction (FDLP). This is followed by a spectral AR model applied on the FDLP envelopes. The output 2-D AR model represents a low-pass modulation filtered spectrogram of the speech signal. The band-pass modulation filtered spectrograms can further be derived by dividing two AR models with different model orders (cut-off frequencies). The modulation filtered spectrograms are converted to cepstral coefficients and are used for a speaker recognition task in noisy and reverberant conditions. Various speaker recognition experiments are performed with clean and noisy versions of the NIST-2010 speaker recognition evaluation (SRE) database using the state-of-the-art speaker recognition system. In these experiments, the proposed front-end analysis provides substantial improvements (relative improvements of up to 25%) compared to baseline techniques. Furthermore, we also illustrate the generalizability of the proposed methods using language identification (LID) experiments on highly degraded high-frequency (HF) radio channels and speech recognition experiments on noisy data.

引用

页码：1285 / 1295

页数：11

共 50 条

[21] Speech feature extraction based on wavelet modulation scale for robust speech recognition
Ma, Xin
Zhou, Weidong
Ju, Fang
Jiang, Qi
NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
[22] Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models
Mallidi, Sri Harish
Ganapathy, Sriram
Hermansky, Hynek
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3656 - 3660
[23] Feature Extraction of Brain-Computer Interface based on Improved Multivariate Adaptive Autoregressive Models
Wang, Jiang
Xu, Guizhi
Wang, Lei
Zhang, Huiyuan
2010 3RD INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2010), VOLS 1-7, 2010, : 895 - 898
[24] Feature extraction and classification of EEC during mental tasks based on fast multivariate autoregressive models
Xue, Jianzhong
Zheng, Chongxun
Yan, Xiangguo
Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2003, 37 (08): : 861 - 864
[25] MULTIVARIATE AUTOREGRESSIVE FEATURE EXTRACTION AND THE RECOGNITION OF MULTICHANNEL WAVEFORMS
TJOSTHEIM, D
SANDVIN, O
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (01) : 80 - 86
[26] Robust feature extraction using multiresolution local pattern information
Liu, Zhuo
Wada, Shigeo
2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 355 - +
[27] Robust Character Recognition Using Adaptive Feature Extraction Method
Mori, Minoru
Sawaki, Minako
Yamato, Junji
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (01): : 125 - 133
[28] Noise robust speech parameterization using multiresolution feature extraction
Hariharan, R
Kiss, I
Viikki, O
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08): : 856 - 865
[29] A robust texture feature extraction using the localized angular phase
Khairul Muzzammil Saipullah
Deok-Hwan Kim
Multimedia Tools and Applications, 2012, 59 : 717 - 747
[30] Robust feature extraction using subband spectral centroid histograms
Gajic, B
Paliwal, KK
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 85 - 88

← 1 2 3 4 5 →