Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

被引:0
|
作者
Eickhoff, Patrick [1 ]
Moeller, Matthias [2 ]
Rosin, Theresa Pekarek [1 ]
Twiefel, Johannes [1 ,3 ]
Wermter, Stefan [1 ]
机构
[1] Univ Hamburg, Dept Informat, Knowledge Technol, Vogt Koelln Str 30, D-22527 Hamburg, Germany
[2] Orebro Univ, Ctr Appl Autonomous Sensor Syst AASS, Orebro, Sweden
[3] exXxa GmbH, Vogt Koelln Str 30, D-22527 Hamburg, Germany
关键词
Conformer; Noise Robustness; Speech Recognition;
D O I
10.1007/978-3-031-44195-0_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle and remove noise conditions from speech. Previous research has shown, that it is possible to extract the denoising capabilities of these models into a preprocessor network, which can be used as a frontend for downstream ASR models. However, the proposed methods were limited to specific fully convolutional architectures. In this work, we propose a novel method to extract the denoising capabilities, that can be applied to any encoder-decoder architecture. We propose the Cleancoder preprocessor architecture that extracts hidden activations from the Conformer ASR model and feeds them to a decoder to predict denoised spectrograms. We train our pre-processor on the Noisy Speech Database (NSD) to reconstruct denoised spectrograms from noisy inputs. Then, we evaluate our model as a frontend to a pretrained Conformer ASR model as well as a frontend to train smaller Conformer ASR models from scratch. We show that the Clean-coder is able to filter noise from speech and that it improves the total Word Error Rate (WER) of the downstream model in noisy conditions for both applications.
引用
收藏
页码:376 / 388
页数:13
相关论文
共 50 条
  • [1] Adding Noise to Improve Noise Robustness in Speech Recognition
    Morales, Nicolas
    Gu, Liang
    Gao, Yuqing
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 861 - +
  • [2] A Curriculum Learning Method for Improved Noise Robustness in Automatic Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Liu, Shih-Chii
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 548 - 552
  • [3] Toward noise robustness speech recognition
    Namarvar, HH
    Liaw, J
    Berger, TW
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4016 - 4016
  • [4] Controlling the Noise Robustness of End-to-End Automatic Speech Recognition Systems
    Moeller, Matthias
    Twiefel, Johannes
    Weber, Cornelius
    Wermter, Stefan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [5] Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data
    Pervaiz, Ayesha
    Hussain, Fawad
    Israr, Huma
    Tahir, Muhammad Ali
    Raja, Fawad Riasat
    Baloch, Naveed Khan
    Ishmanov, Farruh
    Zikria, Yousaf Bin
    [J]. SENSORS, 2020, 20 (08)
  • [6] Noise Robustness of Tract Variables and their Application to Speech Recognition
    Mitra, Vikramjit
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    Goldstein, Louis
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2735 - +
  • [7] Improving Noise Robustness of Speech Emotion Recognition System
    Juszkiewicz, Lukasz
    [J]. INTELLIGENT DISTRIBUTED COMPUTING VII, 2014, 511 : 223 - 232
  • [8] Noise and speaker robustness in a Persian continuous speech recognition system
    Veisi, Hadi
    Sameti, Hossein
    [J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 73 - 76
  • [9] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
  • [10] Robust automatic speech recognition in the presence of impulsive noise
    Potamitis, I
    Fakotakis, N
    Kokkinakis, G
    [J]. ELECTRONICS LETTERS, 2001, 37 (12) : 799 - 800