Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition

被引:1
|
作者
Soni, Meet H. [1 ]
Joshi, Sonal [1 ]
Panda, Ashish [1 ]
机构
[1] TCS Innovat Labs, Mumbai, Maharashtra, India
来源
关键词
Speech recognition; Noise robustness; Front-end processing; Multiconditioned training; FEATURES; NOISE;
D O I
10.1007/978-3-030-27947-9_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present comparison of various front-end enhancement techniques and multiconditioned training for robust Automatic Speech Recognition (ASR) for additive noise. We compare De-noising Autoencoders (DAEs) based on Deep Neural Network (DNN), Time-Delay Neural Network (TDNN) architecture, and Time-Frequency (T-F) masking based DNN based front-ends. We train these front-ends and evaluate their performance on various seen/unseen noise conditions. In multiconditioned training, we train acoustic model on various noise conditions and test on seen/unseen noises along with Noise Aware Training (NAT). The results suggest that all front-ends provide performance improvement for seen noise conditions while degrading performance for unseen noise conditions. TDNN-DAE provides the most improvement for seen conditions while giving the most degradation for unseen conditions. We use a method to improve performance of TDNN-DAE in unseen conditions by training it on features enhanced using Vector Taylor Series with Acoustic Masking (VTS-AM) and Spectral Subtraction (SS). We show that these enhancement techniques improve the efficacy of the TDNN-DAE significantly in unseen noise conditions. Overall we observed that multiconditioned training still gives better performance in case of both seen/unseen noise conditions, although the enhanced TDNN-DAE comes closest among all the front-ends to the performance of multiconditioned training.
引用
收藏
页码:329 / 340
页数:12
相关论文
共 50 条
  • [21] Front-End Feature Compensation for Noise Robust Speech Emotion Recognition
    Pandharipande, Meghna
    Chakraborty, Rupayan
    Panda, Ashish
    Das, Biswajit
    Kopparapu, Sunil Kumar
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [22] Robust Front-End based on MVA processing for Arabic Speech Recognition
    Techini, Elhem
    Sakka, Zied
    Bouhlel, MedSalim
    2017 INTERNATIONAL CONFERENCE ON ENGINEERING & MIS (ICEMIS), 2017,
  • [23] Investigation of Monaural Front-End Processing for Robust Speech Recognition Without Retraining or Joint-Training
    Du, Zhihao
    Zhang, Xueliang
    Han, Jiqing
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 249 - 254
  • [24] A companding front end for noise-robust automatic speech recognition
    Guinness, J
    Raj, B
    Schmidt-Nielsen, B
    Turicchia, L
    Sarpeshkar, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
  • [25] A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition
    Yapanel, Umit H.
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2008, 50 (02) : 142 - 152
  • [26] MULTICHANNEL AUDIO FRONT-END FOR FAR-FIELD AUTOMATIC SPEECH RECOGNITION
    Chhetri, Amit
    Hilmes, Philip
    Kristjansson, Trausti
    Chu, Wai
    Mansour, Mohamed
    Li, Xiaoxue
    Zhang, Xianxian
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1527 - 1531
  • [27] A Cross-entropy-guided (CEG) Measure for Speech Enhancement Front-end Assessing Performances of Back-end Automatic Speech Recognition
    Chai, Li
    Du, Jun
    Lee, Chin-Hui
    INTERSPEECH 2019, 2019, : 3431 - 3435
  • [28] A noise-robust front-end for distributed speech recognition in mobile communications
    Addou, Djamel
    Selouani, Sid-Ahmed
    Kifaya, Kaoukeb
    Boudraa, Malika
    Boudraa, Bachir
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2007, 10 (04) : 167 - 173
  • [29] Front-end Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition
    Chakraborty, Rupayan
    Panda, Ashish
    Pandharipande, Meghna
    Joshi, Sonal
    Kopparapu, Sunil Kumar
    INTERSPEECH 2019, 2019, : 3257 - 3261
  • [30] Investigation into a Mel subspace based front-end processing for robust speech recognition
    Selouani, SA
    O'Shaughnessy, D
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 187 - 190