Modelling non-stationary noise with spectral factorisation in automatic speech recognition

被引:15
|
作者
Hurmalainen, Antti [1 ]
Gemmeke, Jort F. [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, FI-33101 Tampere, Finland
[2] Katholieke Univ Leuven, Dept ESAT PSI, B-3001 Louvain, Belgium
来源
COMPUTER SPEECH AND LANGUAGE | 2013年 / 27卷 / 03期
基金
芬兰科学院;
关键词
Automatic speech recognition; Noise robustness; Non-stationary noise; Non-negative spectral factorisation; Exemplar-based; NONNEGATIVE MATRIX FACTORIZATION; SEPARATION; ALGORITHMS;
D O I
10.1016/j.csl.2012.07.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition systems intended for everyday use must be able to cope with a large variety of noise types and levels, including highly non-stationary multi-source mixtures. This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals. To adapt the system to varying environments, noise models are acquired from the context, or learnt from the mixture itself without prior information. We also propose methods for reducing the size of the bases used for speech and noise modelling by 20-40 times for better practical applicability. We evaluate the performance of the methods both as a standalone classifier and as a signal-enhancing front-end for external recognisers. For the CHiME noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to -6 dB, we report average keyword recognition rates up to 87.8% using a single-stream sparse classification algorithm. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:763 / 779
页数:17
相关论文
共 50 条
  • [1] NON-STATIONARY FEATURE EXTRACTION FOR AUTOMATIC SPEECH RECOGNITION
    Tueske, Zoltan
    Golik, Pavel
    Schlueter, Ralf
    Drepper, Friedhelm R.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5204 - 5207
  • [2] Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise
    Zhu, QF
    Alwan, A
    [J]. COMPUTER SPEECH AND LANGUAGE, 2003, 17 (04): : 381 - 402
  • [3] Speech enhancement for non-stationary noise environments
    Cohen, I
    Berdugo, B
    [J]. SIGNAL PROCESSING, 2001, 81 (11) : 2403 - 2418
  • [4] Particle filter based non-stationary noise tracking for robust speech recognition
    Fujimoto, M
    Nakamura, S
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 257 - 260
  • [5] Speech recognition in non-stationary adverse environments
    Wang, ZH
    Kenny, P
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 265 - 268
  • [6] Spectral estimation of non-stationary white noise
    Allen, JC
    Hobbs, SL
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 1997, 334B (01): : 99 - 116
  • [7] MODELLING SPECTRO-TEMPORAL DYNAMICS IN FACTORISATION-BASED NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Hurmalainen, Antti
    Virtanen, Tuomas
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4113 - 4116
  • [8] AN ANALYSIS OF VECTOR TAYLOR SERIES MODEL COMPENSATION FOR NON-STATIONARY NOISE IN SPEECH RECOGNITION
    Duc Hoang Ha Nguyen
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 131 - 135
  • [9] Towards non-stationary model-based noise adaptation for large vocabulary speech recognition
    Kristjansson, T
    Frey, B
    Deng, L
    Acero, A
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 337 - 340
  • [10] FEATURE ENHANCEMENT BY BIDIRECTIONAL LSTM NETWORKS FOR CONVERSATIONAL SPEECH RECOGNITION IN HIGHLY NON-STATIONARY NOISE
    Woellmer, Martin
    Zhang, Zixing
    Weninger, Felix
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6822 - 6826