Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition

被引：93

作者：

Narayanan, Arun ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 04期

关键词：

Aurora-4; deep neural networks; feature mapping; robust ASR; time-frequency masking; INTELLIGIBILITY; BINARY; ALGORITHM;

D O I：

10.1109/TASLP.2014.2305833

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, supervised classification has been shown to work well for the task of speech separation. We perform an in-depth evaluation of such techniques as a front-end for noise-robust automatic speech recognition (ASR). The proposed separation front-end consists of two stages. The first stage removes additive noise via time-frequency masking. The second stage addresses channel mismatch and the distortions introduced by the first stage; a non-linear function is learned that maps the masked spectral features to their clean counterpart. Results show that the proposed front-end substantially improves ASR performance when the acoustic models are trained in clean conditions. We also propose a diagonal feature discriminant linear regression (dFDLR) adaptation that can be performed on a per-utterance basis for ASR systems employing deep neural networks and HMM. Results show that dFDLR consistently improves performance in all test conditions. Surprisingly, the best average results are obtained when dFDLR is applied to models trained using noisy log-Mel spectral features from the multi-condition training set. With no channel mismatch, the best results are obtained when the proposed speech separation front-end is used along with multi-condition training using log-Mel features followed by dFDLR adaptation. Both these results are among the best on the Aurora-4 dataset.

引用

页码：826 / 835

页数：10

共 50 条

[1] A robust front-end for telephone speech recognition
Cho, HY
Chi, SM
Oh, YH
[J]. PRICAI'98: TOPICS IN ARTIFICIAL INTELLIGENCE, 1998, 1531 : 636 - 644
[2] Front-End Feature Compensation for Noise Robust Speech Emotion Recognition
Pandharipande, Meghna
Chakraborty, Rupayan
Panda, Ashish
Das, Biswajit
Kopparapu, Sunil Kumar
[J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[3] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
Wang, Haikun
Ye, Zhongfu
Chen, Jingdong
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
[4] A noise-robust front-end for distributed speech recognition in mobile communications
Addou, Djamel
Selouani, Sid-Ahmed
Kifaya, Kaoukeb
Boudraa, Malika
Boudraa, Bachir
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2007, 10 (04) : 167 - 173
[5] Front-end Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition
Chakraborty, Rupayan
Panda, Ashish
Pandharipande, Meghna
Joshi, Sonal
Kopparapu, Sunil Kumar
[J]. INTERSPEECH 2019, 2019, : 3257 - 3261
[6] A comparison of front-end configurations for robust speech recognition
Milner, B
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 797 - 800
[7] Investigation into a Mel subspace based front-end processing for robust speech recognition
Selouani, SA
O'Shaughnessy, D
[J]. Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 187 - 190
[8] Efficient Noise-Robust Speech Recognition Front-End Based on the ETSI Standard
Neves, Claudio
Veiga, Arlindo
Sa, Luis
Perdigao, Fernando
[J]. ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 609 - 612
[9] Robust Front-End Processing For Emotion Recognition In Noisy Speech
Pandharipande, Meghna
Chakraborty, Rupayan
Panda, Ashish
Kopparapu, Sunil Kumar
[J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 324 - 328
[10] Performance evaluation of front-end algorithms for robust speech recognition
Cheng, O
Abdulla, W
Salcic, Z
[J]. ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 711 - 714

← 1 2 3 4 5 →