DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION

被引：19

作者：

Pandey, Ashutosh ^{[1
,3
]}

Liu, Chunxi ^{[2
]}

Wang, Yun ^{[2
]}

Saraf, Yatharth ^{[2
]}

机构：

[1] Ohio State Univ, Columbus, OH 43210 USA

[2] Facebook AI, Menlo Pk, CA USA

[3] Facebook, Menlo Pk, CA USA

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

关键词：

speech enhancement; speech recognition; recurrent neural network transducer; complex spectral mapping; consistency loss;

D O I：

10.1109/SLT48900.2021.9383624

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and find it helpful for ASR in two ways: a data augmentation technique, and a preprocessing frontend. In using it for ASR data augmentation, we exploit a KL divergence based consistency loss that is computed between the ASR outputs of original and enhanced utterances. In using speech enhancement as an effective ASR frontend, we propose a three-step training scheme based on model pretraining and feature selection. We evaluate our proposed techniques on a challenging social media English video dataset, and achieve an average relative improvement of 11.2% with speech enhancement based data augmentation, 8.3% with enhancement based preprocessing, and 13.4% when combining both.

引用

页码：223 / 228

页数：6

共 50 条

[1] NETWORKS FOR SPEECH ENHANCEMENT AND AUTOMATIC SPEECH RECOGNITION
Vu, Thanh T.
Bigot, Benjamin
Chng, Eng Siong
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 499 - 503
[2] AN APPLICATION OF AUTOMATIC SPEECH RECOGNITION
HENTHORN, KS
MACCORMACK, PJ
[J]. JOURNAL OF MICROCOMPUTER APPLICATIONS, 1982, 5 (03): : 239 - 245
[3] Arabic Automatic Speech Recognition Enhancement
Ahmed, Basem H. A.
Ghabayen, Ayman S.
[J]. 2017 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT), 2017, : 98 - 102
[4] Multi-Stage Speech Enhancement for Automatic Speech Recognition
Lee, Seungyeol
Lee, Youngwoo
Cho, Namgook
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
[5] An Improved Switch Speech Enhancement Algorithm for Automatic Speech Recognition
Ma, Yongbao
Zhou, Yi
Liu, Jingang
Xia, Jie
Liu, Hongqing
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 430 - 435
[6] CONSTRAINED ITERATIVE SPEECH ENHANCEMENT WITH APPLICATION TO SPEECH RECOGNITION
HANSEN, JHL
CLEMENTS, MA
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (04) : 795 - 805
[7] Coupling Particle Filters with Automatic Speech Recognition for Speech Feature Enhancement
Faubel, Friedrich
Woelfel, Matthias
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 37 - 40
[8] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
Paliwal, Kuldip K.
Lyons, James G.
So, Stephen
Stark, Anthony P.
Wojcicki, Kamil K.
[J]. 2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,
[9] Speech Enhancement Parameter Adjustment to Maximize Accuracy of Automatic Speech Recognition
Kawase, Tomoko
Okamoto, Manabu
Fukutomi, Takaaki
Takahashi, Yamato
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2020, 66 (02) : 125 - 133
[10] Auditory driven subband speech enhancement for automatic recognition of noisy speech
Upadhyay N.
Rosales H.G.
[J]. International Journal of Speech Technology, 2016, 19 (4) : 869 - 880

← 1 2 3 4 5 →