Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition

被引:1
|
作者
Soni, Meet H. [1 ]
Joshi, Sonal [1 ]
Panda, Ashish [1 ]
机构
[1] TCS Innovat Labs, Mumbai, Maharashtra, India
来源
关键词
Speech recognition; Noise robustness; Front-end processing; Multiconditioned training; FEATURES; NOISE;
D O I
10.1007/978-3-030-27947-9_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present comparison of various front-end enhancement techniques and multiconditioned training for robust Automatic Speech Recognition (ASR) for additive noise. We compare De-noising Autoencoders (DAEs) based on Deep Neural Network (DNN), Time-Delay Neural Network (TDNN) architecture, and Time-Frequency (T-F) masking based DNN based front-ends. We train these front-ends and evaluate their performance on various seen/unseen noise conditions. In multiconditioned training, we train acoustic model on various noise conditions and test on seen/unseen noises along with Noise Aware Training (NAT). The results suggest that all front-ends provide performance improvement for seen noise conditions while degrading performance for unseen noise conditions. TDNN-DAE provides the most improvement for seen conditions while giving the most degradation for unseen conditions. We use a method to improve performance of TDNN-DAE in unseen conditions by training it on features enhanced using Vector Taylor Series with Acoustic Masking (VTS-AM) and Spectral Subtraction (SS). We show that these enhancement techniques improve the efficacy of the TDNN-DAE significantly in unseen noise conditions. Overall we observed that multiconditioned training still gives better performance in case of both seen/unseen noise conditions, although the enhanced TDNN-DAE comes closest among all the front-ends to the performance of multiconditioned training.
引用
收藏
页码:329 / 340
页数:12
相关论文
共 50 条
  • [1] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
    Wang, Haikun
    Ye, Zhongfu
    Chen, Jingdong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
  • [2] Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End
    Tan, Qun Feng
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2418 - 2429
  • [3] An efficient front-end for automatic speech recognition
    Ahadi, SM
    Sheikhzadeh, H
    Brennan, RL
    Freeman, GH
    ICECS 2003: PROCEEDINGS OF THE 2003 10TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2003, : 128 - 131
  • [4] A robust front-end for telephone speech recognition
    Cho, HY
    Chi, SM
    Oh, YH
    PRICAI'98: TOPICS IN ARTIFICIAL INTELLIGENCE, 1998, 1531 : 636 - 644
  • [5] Automatic Speech Recognition with a Cochlear Implant Front-End
    Nogueira, Waldo
    Harczos, Tamas
    Edler, Bernd
    Ostermann, Joern
    Buechner, Andreas
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1993 - +
  • [6] A Front-End Technique for Automatic Noisy Speech Recognition
    Naing, Hay Mar Soe
    Hidayat, Risanuri
    Hartanto, Rudy
    Miyanaga, Yoshikazu
    PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 49 - 54
  • [7] A comparison of front-end configurations for robust speech recognition
    Milner, B
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 797 - 800
  • [8] A Unified Front-end Anti-interference Approach for Robust Automatic Speech Recognition
    Liang, Yunming
    Zhou, Yi
    Ma, Yongbao
    Liu, Hongqing
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [9] Robust connected digit recognition using speech enhancement and an auditory model front-end
    Flynn, Ronan
    Jones, Edward
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 410 - +
  • [10] Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
    Narayanan, Arun
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 826 - 835