A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments

被引:27
|
作者
Visser, E
Otsuka, M
Lee, TW
机构
[1] Univ Calif San Diego, Inst Neural Computat, Dept 0523, La Jolla, CA 92093 USA
[2] DENSO Corp, Res Labs, Aichi 4700111, Japan
关键词
speech enhancement; robust speech recognition; blind source separation; noisy environments;
D O I
10.1016/S0167-6393(03)00010-4
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A new speech enhancement scheme is presented integrating spatial and temporal signal processing methods for robust speech recognition in noisy environments. The scheme first separates spatially localized point sources from noisy speech signals recorded by two microphones. Blind source separation algorithms assuming no a priori knowledge about the sources involved are applied in this spatial processing stage. Then denoising of distributed background noise is achieved in a combined spatial/temporal processing approach. The desired speaker signal is first processed along with an artificially constructed noise signal in a supplementary blind source separation step. It is further denoised by exploiting differences in temporal speech and noise statistics in a wavelet filterbank. The scheme's performance is illustrated by speech recognition experiments on real recordings in a noisy car environment. In comparison to a common multi-microphone technique like beamforming with spectral subtraction, the scheme is shown to enable more accurate speech recognition in the presence of a highly interfering point source and strong background noise. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:393 / 407
页数:15
相关论文
共 50 条
  • [21] An effective cluster-based model for robust speech detection and speech recognition in noisy environments
    Górriz, J.M.
    Ramírez, J.
    Segura, J.C.
    Puntonet, C.G.
    [J]. Journal of the Acoustical Society of America, 2006, 120 (01): : 470 - 481
  • [22] Speech enhancement method based on feature compensation gain for effective speech recognition in noisy environments
    Bae, Ara
    Kim, Wooil
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (01): : 51 - 55
  • [23] Auditory model for robust speech recognition in real world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    [J]. ELECTRONICS LETTERS, 1997, 33 (01) : 12 - 13
  • [24] Blind source extraction for robust speech recognition in multisource noisy environments
    Nesta, Francesco
    Matassoni, Marco
    [J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 703 - 725
  • [25] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
  • [26] Spatio-temporal wavelets and tracking in noisy environments
    Mujica, F
    Murenzi, R
    Smith, MJT
    [J]. WAVELET APPLICATIONS V, 1998, 3391 : 560 - 568
  • [27] A signal subspace approach to spatio-temporal prediction for multichannel speech enhancement
    Adam Borowicz
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [28] A Spatio-Temporal Speech Enhancement Technique Based on Generalized Eigenvalue Decomposition
    Gupta, Malay
    Douglas, Scott C.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 830 - 839
  • [29] A signal subspace approach to spatio-temporal prediction for multichannel speech enhancement
    Borowicz, Adam
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 12
  • [30] COMPARISON OF DIFFERENT SPEECH ENHANCEMENT METHODS ON RECOGNITION OF NOISY SPEECH
    AHMED, MS
    ALMARZOUG, AM
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 1994, 19 (01): : 45 - 56