Robust Feature Extraction Using Modulation Filtering of Autoregressive Models

被引：39

作者：

Ganapathy, Sriram ^{[1
]}

Mallidi, Sri Harish ^{[2
]}

Hermansky, Hynek ^{[2
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 08期

关键词：

Autoregressive modeling; feature extraction; modulation filtering; speaker and language recognition; FRONT-END; SPEECH; RECOGNITION;

D O I：

10.1109/TASLP.2014.2329190

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker and language recognition in noisy and degraded channel conditions continue to be a challenging problem mainly due to the mismatch between clean training and noisy test conditions. In the presence of noise, the most reliable portions of the signal are the high energy regions which can be used for robust feature extraction. In this paper, we propose a front end processing scheme based on autoregressive (AR) models that represent the high energy regions with good accuracy followed by a modulation filtering process. The AR model of the spectrogram is derived using two separable time and frequency AR transforms. The first AR model (temporal AR model) of the sub-band Hilbert envelopes is derived using frequency domain linear prediction (FDLP). This is followed by a spectral AR model applied on the FDLP envelopes. The output 2-D AR model represents a low-pass modulation filtered spectrogram of the speech signal. The band-pass modulation filtered spectrograms can further be derived by dividing two AR models with different model orders (cut-off frequencies). The modulation filtered spectrograms are converted to cepstral coefficients and are used for a speaker recognition task in noisy and reverberant conditions. Various speaker recognition experiments are performed with clean and noisy versions of the NIST-2010 speaker recognition evaluation (SRE) database using the state-of-the-art speaker recognition system. In these experiments, the proposed front-end analysis provides substantial improvements (relative improvements of up to 25%) compared to baseline techniques. Furthermore, we also illustrate the generalizability of the proposed methods using language identification (LID) experiments on highly degraded high-frequency (HF) radio channels and speech recognition experiments on noisy data.

引用

页码：1285 / 1295

页数：11

共 50 条

[1] Robust feature extraction using modulation filtering of autoregressive models
Ganapathy, Sriram
Mallidi, Sri Harish
Hermansky, Hynek
Ganapathy, Sriram, 1600, Institute of Electrical and Electronics Engineers Inc., United States (22): : 1285 - 1295
[2] AUTOMATIC DETECTION OF ANGER IN TELEPHONE SPEECH WITH ROBUST AUTOREGRESSIVE MODULATION FILTERING
Pohjalainen, Jouni
Alku, Paavo
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7537 - 7541
[3] RECOGNITION OF WAVEFORMS USING AUTOREGRESSIVE FEATURE EXTRACTION
TJOSTHEIM, D
IEEE TRANSACTIONS ON COMPUTERS, 1977, 26 (03) : 268 - 270
[4] Robust Median Filtering Forensics Using an Autoregressive Model
Kang, Xiangui
Stamm, Matthew C.
Peng, Anjie
Liu, K. J. Ray
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2013, 8 (09) : 1456 - 1468
[5] Generalized feature extraction for time-varying autoregressive models
Rajan, JJ
Rayner, PJW
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1996, 44 (10) : 2498 - 2507
[6] Robust Feature Extraction Based Watermarking Method using Spread Transform Dither Modulation
Li, Mianjie
Yuan, Xiaochen
2017 INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT), 2017, : 18 - 22
[7] ROBUST SPEECH FEATURE EXTRACTION BASED ON GABOR FILTERING AND TENSOR FACTORIZATION
Wu, Qiang
Zhang, Liqing
Shi, Guangchuan
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4649 - 4652
[8] Temporal modulation normalization for robust speech feature extraction and recognition
Lu, Xugang
Matsuda, Shigeki
Unoki, Masashi
Nakamura, Satoshi
MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 52 (01) : 187 - 199
[9] Temporal modulation normalization for robust speech feature extraction and recognition
Xugang Lu
Shigeki Matsuda
Masashi Unoki
Satoshi Nakamura
Multimedia Tools and Applications, 2011, 52 : 187 - 199
[10] Temporal modulation normalization for robust speech feature extraction and recognition
Lu, Xugang
Matsuda, Shigeki
Unoki, Masashi
Nakamura, Satoshi
PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4354 - 4357

← 1 2 3 4 5 →