Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition

被引:1
|
作者
Soni, Meet H. [1 ]
Joshi, Sonal [1 ]
Panda, Ashish [1 ]
机构
[1] TCS Innovat Labs, Mumbai, Maharashtra, India
来源
关键词
Speech recognition; Noise robustness; Front-end processing; Multiconditioned training; FEATURES; NOISE;
D O I
10.1007/978-3-030-27947-9_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present comparison of various front-end enhancement techniques and multiconditioned training for robust Automatic Speech Recognition (ASR) for additive noise. We compare De-noising Autoencoders (DAEs) based on Deep Neural Network (DNN), Time-Delay Neural Network (TDNN) architecture, and Time-Frequency (T-F) masking based DNN based front-ends. We train these front-ends and evaluate their performance on various seen/unseen noise conditions. In multiconditioned training, we train acoustic model on various noise conditions and test on seen/unseen noises along with Noise Aware Training (NAT). The results suggest that all front-ends provide performance improvement for seen noise conditions while degrading performance for unseen noise conditions. TDNN-DAE provides the most improvement for seen conditions while giving the most degradation for unseen conditions. We use a method to improve performance of TDNN-DAE in unseen conditions by training it on features enhanced using Vector Taylor Series with Acoustic Masking (VTS-AM) and Spectral Subtraction (SS). We show that these enhancement techniques improve the efficacy of the TDNN-DAE significantly in unseen noise conditions. Overall we observed that multiconditioned training still gives better performance in case of both seen/unseen noise conditions, although the enhanced TDNN-DAE comes closest among all the front-ends to the performance of multiconditioned training.
引用
收藏
页码:329 / 340
页数:12
相关论文
共 50 条
  • [31] Implementation of an acoustic front-end for speech recognition
    Albarello, Alain
    Breitschaedel, Robert
    Ciaramella, Alberto
    Lenormand, Eric
    Pacifici, Roberto
    Potage, Jean
    Riviere, Jean-Pierre
    Scheibel, Norbert
    Venuti, Giovanni
    CSELT Technical Reports, 1988, 16 (05): : 455 - 459
  • [32] Feature enhancement for a bitstream-based front-end in wireless speech recognition
    Kim, HK
    Cox, RV
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 241 - 244
  • [33] A biological front-end processing for speech recognition
    Ferrandez, JM
    del Valle, D
    Rodellar, V
    Gomez, P
    BIOLOGICAL AND ARTIFICIAL COMPUTATION: FROM NEUROSCIENCE TO TECHNOLOGY, 1997, 1240 : 1058 - 1067
  • [34] Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
    Liu, Bin
    Nie, Shuai
    Liang, Shan
    Liu, Wenju
    Yu, Meng
    Chen, Lianwu
    Peng, Shouye
    Li, Changliang
    INTERSPEECH 2019, 2019, : 491 - 495
  • [35] ADAPTIVE DIFFERENTIAL MICROPHONE ARRAYS USED AS A FRONT-END FOR AN AUTOMATIC SPEECH RECOGNITION SYSTEM
    Messner, Elmar
    Pessentheiner, Hannes
    Morales-Cordovilla, Juan A.
    Hagmueller, Martin
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2689 - 2693
  • [36] Efficient Noise-Robust Speech Recognition Front-End Based on the ETSI Standard
    Neves, Claudio
    Veiga, Arlindo
    Sa, Luis
    Perdigao, Fernando
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 609 - 612
  • [37] A new approach to variable frame rate front-end processing for robust speech recognition
    Epps, J
    ISSPA 2005: The 8th International Symposium on Signal Processing and its Applications, Vols 1 and 2, Proceedings, 2005, : 723 - 726
  • [38] Experiments on Front-End Techniques and Segmentation Model for Robust Indian Language Speech Recognizer
    Sriranjani, R.
    Karthick, Murali B.
    Umesh, S.
    2014 TWENTIETH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2014,
  • [39] Multi-microphone noise reduction techniques as front-end devices for speech recognition
    Bitzer, J
    Simmer, KU
    Kammeyer, KD
    SPEECH COMMUNICATION, 2001, 34 (1-2) : 3 - 12
  • [40] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
    Abdelaziz, Ahmed Hussen
    Zeiler, Steffen
    Kolossa, Dorothea
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871