Modelling non-stationary noise with spectral factorisation in automatic speech recognition

被引:15
|
作者
Hurmalainen, Antti [1 ]
Gemmeke, Jort F. [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, FI-33101 Tampere, Finland
[2] Katholieke Univ Leuven, Dept ESAT PSI, B-3001 Louvain, Belgium
来源
COMPUTER SPEECH AND LANGUAGE | 2013年 / 27卷 / 03期
基金
芬兰科学院;
关键词
Automatic speech recognition; Noise robustness; Non-stationary noise; Non-negative spectral factorisation; Exemplar-based; NONNEGATIVE MATRIX FACTORIZATION; SEPARATION; ALGORITHMS;
D O I
10.1016/j.csl.2012.07.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition systems intended for everyday use must be able to cope with a large variety of noise types and levels, including highly non-stationary multi-source mixtures. This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals. To adapt the system to varying environments, noise models are acquired from the context, or learnt from the mixture itself without prior information. We also propose methods for reducing the size of the bases used for speech and noise modelling by 20-40 times for better practical applicability. We evaluate the performance of the methods both as a standalone classifier and as a signal-enhancing front-end for external recognisers. For the CHiME noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to -6 dB, we report average keyword recognition rates up to 87.8% using a single-stream sparse classification algorithm. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:763 / 779
页数:17
相关论文
共 50 条
  • [31] Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments
    Malah, D
    Cox, RV
    Accardi, AJ
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 789 - 792
  • [32] Sparse Hidden Markov Models for Speech Enhancement in Non-Stationary Noise Environments
    Deng, Feng
    Bao, Changchun
    Kleijn, W. Bastiaan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1973 - 1987
  • [33] Noise estimation for speech enhancement in non-stationary environments-A new method
    Rama Rao, Ch. V.
    Gowthami
    Harsha
    Rajkumar
    Rama Murthy, M.B.
    Srinivasa Rao, K.
    Anitha Sheela, K.
    [J]. World Academy of Science, Engineering and Technology, 2010, 46 : 738 - 741
  • [34] Robust Speech Enhancement Techniques for ASR in Non-stationary Noise and Dynamic Environments
    Liu, Gang
    Dimitriadis, Dimitrios
    Bocchieri, Enrico
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3016 - 3020
  • [35] A more effective speech enhancement algorithm under non-stationary noise environment
    Cheng, Gong
    Guo, Lei
    Zhao, Tianyun
    He, Sheng
    [J]. Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2010, 28 (05): : 664 - 668
  • [36] Speech enhancement of non-stationary noise based on Controlled Forward Moving Average
    Farrokhi, Dariush
    Togneri, Roberto
    Zaknich, Anthony
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, VOLS 1-3, 2007, : 1551 - 1555
  • [37] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
    Guangxin Hu
    Sarah C. Determan
    Yue Dong
    Alec T. Beeve
    Joshua E. Collins
    Yan Gai
    [J]. Journal of the Association for Research in Otolaryngology, 2020, 21 : 73 - 87
  • [38] Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise
    Hu, Guangxin
    Determan, Sarah C.
    Dong, Yue
    Beeve, Alec T.
    Collins, Joshua E.
    Gai, Yan
    [J]. JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2020, 21 (01): : 73 - 87
  • [39] A PSYCHOACOUSTIC SPECTRAL SUBTRACTION METHOD FOR NOISE SUPPRESSION IN AUTOMATIC SPEECH RECOGNITION
    Haque, Serajul
    Togneri, Roberto
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 1618 - 1621
  • [40] INCORPORATING MASK MODELLING FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Koekueer, Muenevver
    Jancovic, Peter
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3929 - 3932