NOISE ROBUST SPEECH RECOGNITION ON AURORA4 BY HUMANS AND MACHINES

被引：0

作者：

Qian, Yanmin ^{[1
,2
]}

Tan, Tian ^{[1
]}

Hu, Hu ^{[1
]}

Liu, Qi ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China

[2] Tencent, Tencent AI Lab, Bellevue, WA 98004 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

robust speech recognition; very deep convolution residual network; cluster adaptive training; future-vector;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Although great progress has been made in automatic speech recognition (ASR), significant performance degradation still exists in noisy environments. Based on our previous introduced very deep CNNs, this paper further integrates residual learning to evaluate very deep convolutional residual network (VDCRN) in noisy conditions, which shows more powerful robustness. Then, cluster adaptive training (CAT) is developed on the VDCRN to reduce the mismatch between the training and testing in noisy scenarios. Moreover, the advanced future-vector assisted LSTM-RNN LM is proposed to achieve a further gain. All the proposed approaches are evaluated on Aurora4 and show a significant improvement for each technology. The final system achieves 3.09% WER on Aurora4, which is approaching humans' performance on this task. This is a new milestone for noise-robust ASR on this benchmark.

引用

页码：5604 / 5608

页数：5

共 50 条

[21] Assessing costa rican children speech recognition by humans and machines
Morales-Rodriguez, Maribel
Coto-Jimenez, Marvin
TECNOLOGIA EN MARCHA, 2022, 35
[22] SYNTHESIS AND RECOGNITION OF SPEECH - VOICE COMMUNICATION BETWEEN HUMANS AND MACHINES
FLANAGAN, JL
IEEE TRANSACTIONS ON SONICS AND ULTRASONICS, 1982, 29 (03): : 158 - 158
[23] A Study of Additive Noise Model for Robust Speech Recognition
Awatade, Manisha H.
2ND INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN SCIENCE AND TECHNOLOGY (ICM2ST-11), 2011, 1414
[24] Noise robust automatic speech recognition: review and analysis
Dua M.
Akanksha
Dua S.
International Journal of Speech Technology, 2023, 26 (02) : 475 - 519
[25] Extended VTS for Noise-Robust Speech Recognition
van Dalen, Rogier C.
Gales, Mark J. F.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743
[26] A novel channel estimate for noise robust speech recognition
Vanderreydt, Geoffroy
Demuynck, Kris
COMPUTER SPEECH AND LANGUAGE, 2024, 86
[27] NOISE AWARE MANIFOLD LEARNING FOR ROBUST SPEECH RECOGNITION
Tomar, Vikrant Singh
Rose, Richard C.
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7087 - 7091
[28] Noise robust speech recognition with state duration constraints
Laurila, K
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 871 - 874
[29] Robust automatic speech recognition in the presence of impulsive noise
Potamitis, I
Fakotakis, N
Kokkinakis, G
ELECTRONICS LETTERS, 2001, 37 (12) : 799 - 800
[30] An overview of noise-robust automatic speech recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777

← 1 2 3 4 5 →