Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise

被引:13
|
作者
Zhu, QF [1 ]
Alwan, A [1 ]
机构
[1] Univ Calif Los Angeles, Henry Samuli Sch Engn & Appl Sci, Dept Elect Engn, Los Angeles, CA 90095 USA
来源
COMPUTER SPEECH AND LANGUAGE | 2003年 / 17卷 / 04期
关键词
D O I
10.1016/S0885-2308(03)00026-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An analysis-based non-linear feature extraction approach is proposed, inspired by a model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noise-robust parts of speech spectra without losing discriminative information. Two non-linear processing methods, harmonic demodulation and spectral peak-to-valley ratio locking., are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation [IEEE Transactions on Speech and Audio Processing 5 (1997) 451]. is also discussed with this model. These methods do not require noise estimation and are effective in dealing with both stationary and non-stationary noise. In the presence of additive noise, ASR experiments show that using these techniques in the computation of MFCCs improves recognition performance greatly. For the T146 isolated digits database. the average recognition rate across several SNRs is improved from 60% (using unmodified MFCCs) to 95% (using the proposed techniques) with additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs improves from 58% to 80%. (C) 2003 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:381 / 402
页数:22
相关论文
共 50 条
  • [1] NON-STATIONARY FEATURE EXTRACTION FOR AUTOMATIC SPEECH RECOGNITION
    Tueske, Zoltan
    Golik, Pavel
    Schlueter, Ralf
    Drepper, Friedhelm R.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5204 - 5207
  • [2] Mask Estimation in Non-stationary Noise Environments for Missing Feature Based Robust Speech Recognition
    Badiezadegan, Shirin
    Rose, Richard C.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2062 - 2065
  • [3] Particle filter based non-stationary noise tracking for robust speech recognition
    Fujimoto, M
    Nakamura, S
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 257 - 260
  • [4] Non-linear and non-stationary sea waves
    Cherneva, Z.
    Guedes Soares, C.
    [J]. MARINE TECHNOLOGY AND ENGINEERING, VOL 1, 2011, : 45 - 67
  • [5] Stationary and non-stationary probability density function for non-linear oscillators
    Muscolino, G
    Ricciardi, G
    Vasta, M
    [J]. INTERNATIONAL JOURNAL OF NON-LINEAR MECHANICS, 1997, 32 (06) : 1051 - 1064
  • [6] FEATURE ENHANCEMENT BY BIDIRECTIONAL LSTM NETWORKS FOR CONVERSATIONAL SPEECH RECOGNITION IN HIGHLY NON-STATIONARY NOISE
    Woellmer, Martin
    Zhang, Zixing
    Weninger, Felix
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6822 - 6826
  • [7] Robust Estimation of Non-Stationary Noise Power Spectrum for Speech Enhancement
    Mai, Van-Khanh
    Pastor, Dominique
    Aissa-El-Bey, Abdeldjalil
    Le-Bidan, Raphael
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (04) : 670 - 682
  • [8] Speech enhancement for non-stationary noise environments
    Cohen, I
    Berdugo, B
    [J]. SIGNAL PROCESSING, 2001, 81 (11) : 2403 - 2418
  • [9] Modelling non-stationary noise with spectral factorisation in automatic speech recognition
    Hurmalainen, Antti
    Gemmeke, Jort F.
    Virtanen, Tuomas
    [J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 763 - 779
  • [10] Speech recognition in non-stationary adverse environments
    Wang, ZH
    Kenny, P
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 265 - 268