NOISE ROBUST SPEECH RECOGNITION ON AURORA4 BY HUMANS AND MACHINES

被引：0

作者：

Qian, Yanmin ^{[1
,2
]}

Tan, Tian ^{[1
]}

Hu, Hu ^{[1
]}

Liu, Qi ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

[2] Tencent, Tencent AI Lab, Bellevue, WA 98004 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

robust speech recognition; very deep convolution residual network; cluster adaptive training; future-vector;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Although great progress has been made in automatic speech recognition (ASR), significant performance degradation still exists in noisy environments. Based on our previous introduced very deep CNNs, this paper further integrates residual learning to evaluate very deep convolutional residual network (VDCRN) in noisy conditions, which shows more powerful robustness. Then, cluster adaptive training (CAT) is developed on the VDCRN to reduce the mismatch between the training and testing in noisy scenarios. Moreover, the advanced future-vector assisted LSTM-RNN LM is proposed to achieve a further gain. All the proposed approaches are evaluated on Aurora4 and show a significant improvement for each technology. The final system achieves 3.09% WER on Aurora4, which is approaching humans' performance on this task. This is a new milestone for noise-robust ASR on this benchmark.

引用

页码：5604 / 5608

页数：5

共 50 条

[41] An Overview of Noise-Robust Automatic Speech Recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
[42] Instantaneous Frequency Features for Noise Robust Speech Recognition
Nayak, Shekhar
Dhar, Shashank B.
Bhati, Saurabhchand
Bramhendra, Koilakuntla
Murty, K. Sri Rama
2019 25TH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2019,
[43] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
van Dalen, R. C.
Gales, M. J. F.
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
[44] SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
Fernandez-Lopez, Adriana
Chen, Honglie
Ma, Pingchuan
Haliassos, Alexandros
Petridis, Stavros
Pantic, Maja
INTERSPEECH 2023, 2023, : 1603 - 1607
[45] Covariance Modelling for Noise-Robust Speech Recognition
van Dalen, R. C.
Gales, M. J. F.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003
[46] Cepstral gain normalization for noise robust speech recognition
Yoshizawa, S
Hayasaka, N
Wada, N
Miyanaga, Y
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 209 - 212
[47] HISTOGRAM EQUALIZATION AND NOISE MASKING FOR ROBUST SPEECH RECOGNITION
Zhang, Xueru
Demuynck, Kris
Van Hamme, Hugo
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4578 - 4581
[48] ROBUST SPEECH RECOGNITION USING DYNAMIC NOISE ADAPTATION
Rennie, Steven
Dognin, Pierre
Fousek, Petr
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4592 - 4595
[49] Noise Adaptive Training for Robust Automatic Speech Recognition
Kalinli, Ozlem
Seltzer, Michael L.
Droppo, Jasha
Acero, Alex
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1889 - 1901
[50] A SPARSITY BASED PREPROCESSING FOR NOISE ROBUST SPEECH RECOGNITION
Koniaris, Christos
Chatterjee, Saikat
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 513 - 518

← 1 2 3 4 5 →