Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

被引:0
|
作者
Eickhoff, Patrick [1 ]
Moeller, Matthias [2 ]
Rosin, Theresa Pekarek [1 ]
Twiefel, Johannes [1 ,3 ]
Wermter, Stefan [1 ]
机构
[1] Univ Hamburg, Dept Informat, Knowledge Technol, Vogt Koelln Str 30, D-22527 Hamburg, Germany
[2] Orebro Univ, Ctr Appl Autonomous Sensor Syst AASS, Orebro, Sweden
[3] exXxa GmbH, Vogt Koelln Str 30, D-22527 Hamburg, Germany
关键词
Conformer; Noise Robustness; Speech Recognition;
D O I
10.1007/978-3-031-44195-0_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Clean-coder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.
引用
收藏
页码:376 / 388
页数:13
相关论文
共 50 条
  • [21] Issues with uncertainty decoding for noise robust automatic speech recognition
    Liao, H.
    Gales, M. J. F.
    [J]. SPEECH COMMUNICATION, 2008, 50 (04) : 265 - 277
  • [22] Minimum based noise suppression for improved automatic speech recognition
    Fernández, J
    Meyer, C
    Fischer, A
    [J]. PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2004, : 243 - 248
  • [23] JOINT NOISE ADAPTIVE TRAINING FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Narayanan, Arun
    Wang, DeLiang
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [24] Handling Convolutional Noise in Missing Data Automatic Speech Recognition
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2562 - 2565
  • [25] AN ASSESSMENT OF AUTOMATIC SPEECH RECOGNITION AS SPEECH INTELLIGIBILITY ESTIMATION IN THE CONTEXT OF ADDITIVE NOISE
    Liu, Wei M.
    Mason, John S. D.
    Evans, Nicholas W. D.
    Jellyman, Keith A.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2166 - 2169
  • [26] Noise Robust Exemplar Matching for Speech Enhancement: Applications to Automatic Speech Recognition
    Yilmaz, Emre
    Baby, Deepak
    Van Hannne, Hugo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 688 - 692
  • [27] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    [J]. 2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
  • [29] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
    Mahadevaswamy
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2023, 130 (03) : 2039 - 2058
  • [30] Noise robust automatic speech recognition with adaptive quantile based noise estimation and speech band emphasizing filter bank
    Bonde, CS
    Graversen, C
    Gregersen, AG
    Ngo, KH
    Normark, K
    Purup, M
    Thorsen, T
    Lindberg, B
    [J]. NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 291 - 302