Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition

被引:1
|
作者
Soni, Meet H. [1 ]
Joshi, Sonal [1 ]
Panda, Ashish [1 ]
机构
[1] TCS Innovat Labs, Mumbai, Maharashtra, India
来源
关键词
Speech recognition; Noise robustness; Front-end processing; Multiconditioned training; FEATURES; NOISE;
D O I
10.1007/978-3-030-27947-9_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present comparison of various front-end enhancement techniques and multiconditioned training for robust Automatic Speech Recognition (ASR) for additive noise. We compare De-noising Autoencoders (DAEs) based on Deep Neural Network (DNN), Time-Delay Neural Network (TDNN) architecture, and Time-Frequency (T-F) masking based DNN based front-ends. We train these front-ends and evaluate their performance on various seen/unseen noise conditions. In multiconditioned training, we train acoustic model on various noise conditions and test on seen/unseen noises along with Noise Aware Training (NAT). The results suggest that all front-ends provide performance improvement for seen noise conditions while degrading performance for unseen noise conditions. TDNN-DAE provides the most improvement for seen conditions while giving the most degradation for unseen conditions. We use a method to improve performance of TDNN-DAE in unseen conditions by training it on features enhanced using Vector Taylor Series with Acoustic Masking (VTS-AM) and Spectral Subtraction (SS). We show that these enhancement techniques improve the efficacy of the TDNN-DAE significantly in unseen noise conditions. Overall we observed that multiconditioned training still gives better performance in case of both seen/unseen noise conditions, although the enhanced TDNN-DAE comes closest among all the front-ends to the performance of multiconditioned training.
引用
收藏
页码:329 / 340
页数:12
相关论文
共 50 条
  • [41] A noise robust front-end for speech recognition using hough transform and cumulative distribution mapping
    Choi, Eric H. C.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 286 - +
  • [42] Robust front-end for speech recognition based on computational auditory scene analysis and speaker model
    Guan, Yong
    Li, Peng
    Liu, Wen-Ju
    Xu, Bo
    Zidonghua Xuebao/ Acta Automatica Sinica, 2009, 35 (04): : 410 - 416
  • [43] A noise robust front-end with low computational cost for embedded in-car speech recognition
    Ding, Pei
    He, Lei
    Yan, Xiang
    Zhao, Rui
    Hao, Jie
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1045 - +
  • [44] Recognizing voice aver IP:: A robust front-end for speech recognition on the World Wide Web
    Peláez-Moreno, C
    Gallardo-Antolín, A
    Díaz-De-María, F
    IEEE TRANSACTIONS ON MULTIMEDIA, 2001, 3 (02) : 209 - 218
  • [45] A New Subband-Weighted MVDR-Based Front-End for Robust Speech Recognition
    Seyedin, Sanaz
    Ahadi, Seyed Mohammad
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (08): : 2252 - 2261
  • [46] Front-end speech enhancement for commercial speaker verification systems
    Eskimez, Sefik Emre
    Soufleris, Peter
    Duan, Zhiyao
    Heinzelman, Wendi
    SPEECH COMMUNICATION, 2018, 99 : 101 - 113
  • [47] A Speech Enhancement Front-End for Intent Classification in Noisy Environments
    Ali, Mohamed Nabih
    Schmalz, Veronica Juliana
    Brutti, Alessio
    Falavigna, Daniele
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 471 - 475
  • [48] The speech recognition based on the bark wavelet front-end processing
    Zhang, XY
    Jiao, ZP
    Zhao, ZF
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 302 - 305
  • [49] Thin client front-end processor for distributed speech recognition
    Chow, KF
    Liew, SC
    Lua, KT
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 29 - 32
  • [50] Wavelet-based Front-End for Electromyographic Speech Recognition
    Wand, Michael
    Jou, Szu-Chen Stan
    Schultz, Tanja
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1773 - +