Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition

被引：1

作者：

Soni, Meet H. ^{[1
]}

Joshi, Sonal ^{[1
]}

Panda, Ashish ^{[1
]}

机构：

[1] TCS Innovat Labs, Mumbai, Maharashtra, India

来源：

TEXT, SPEECH, AND DIALOGUE (TSD 2019) | 2019年 / 11697卷

关键词：

Speech recognition; Noise robustness; Front-end processing; Multiconditioned training; FEATURES; NOISE;

D O I：

10.1007/978-3-030-27947-9_28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present comparison of various front-end enhancement techniques and multiconditioned training for robust Automatic Speech Recognition (ASR) for additive noise. We compare De-noising Autoencoders (DAEs) based on Deep Neural Network (DNN), Time-Delay Neural Network (TDNN) architecture, and Time-Frequency (T-F) masking based DNN based front-ends. We train these front-ends and evaluate their performance on various seen/unseen noise conditions. In multiconditioned training, we train acoustic model on various noise conditions and test on seen/unseen noises along with Noise Aware Training (NAT). The results suggest that all front-ends provide performance improvement for seen noise conditions while degrading performance for unseen noise conditions. TDNN-DAE provides the most improvement for seen conditions while giving the most degradation for unseen conditions. We use a method to improve performance of TDNN-DAE in unseen conditions by training it on features enhanced using Vector Taylor Series with Acoustic Masking (VTS-AM) and Spectral Subtraction (SS). We show that these enhancement techniques improve the efficacy of the TDNN-DAE significantly in unseen noise conditions. Overall we observed that multiconditioned training still gives better performance in case of both seen/unseen noise conditions, although the enhanced TDNN-DAE comes closest among all the front-ends to the performance of multiconditioned training.

引用

页码：329 / 340

页数：12

共 50 条

[1] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
Wang, Haikun
Ye, Zhongfu
Chen, Jingdong
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
[2] Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End
Tan, Qun Feng
Georgiou, Panayiotis G.
Narayanan, Shrikanth
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (08): : 2418 - 2429
[3] An efficient front-end for automatic speech recognition
Ahadi, SM
Sheikhzadeh, H
Brennan, RL
Freeman, GH
ICECS 2003: PROCEEDINGS OF THE 2003 10TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS 1-3, 2003, : 128 - 131
[4] A robust front-end for telephone speech recognition
Cho, HY
Chi, SM
Oh, YH
PRICAI'98: TOPICS IN ARTIFICIAL INTELLIGENCE, 1998, 1531 : 636 - 644
[5] Automatic Speech Recognition with a Cochlear Implant Front-End
Nogueira, Waldo
Harczos, Tamas
Edler, Bernd
Ostermann, Joern
Buechner, Andreas
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1993 - +
[6] A Front-End Technique for Automatic Noisy Speech Recognition
Naing, Hay Mar Soe
Hidayat, Risanuri
Hartanto, Rudy
Miyanaga, Yoshikazu
PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 49 - 54
[7] A comparison of front-end configurations for robust speech recognition
Milner, B
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 797 - 800
[8] A Unified Front-end Anti-interference Approach for Robust Automatic Speech Recognition
Liang, Yunming
Zhou, Yi
Ma, Yongbao
Liu, Hongqing
2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
[9] Robust connected digit recognition using speech enhancement and an auditory model front-end
Flynn, Ronan
Jones, Edward
2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 410 - +
[10] Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
Narayanan, Arun
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 826 - 835

← 1 2 3 4 5 →