NOISE ROBUST SPEECH RECOGNITION USING RECENT DEVELOPMENTS IN NEURAL NETWORKS FOR COMPUTER VISION

被引：0

作者：

Yoshioka, Takuya ^{[1
]}

Ohnishi, Katsunori ^{[1
,2
]}

Fang, Fuming ^{[1
,3
]}

Nakatani, Toniohiro ^{[1
]}

机构：

[1] NTT Corp, Tokyo, Tokyo, Japan

[2] Univ Tokyo, Tokyo 1138654, Japan

[3] Tokyo Inst Technol, Tokyo, Japan

来源：

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年

关键词：

Automatic speech recognition; noise robustness; convolutional neural network; parametric rectified linear unit;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Convolutional Neural Networks (CNNs) are superior to fully connected neural networks in various speech recognition tasks and the advantage is pronounced in noisy environments. In recent years, many techniques have been proposed in the computer vision community to improve CNN's classification performance. This paper considers two approaches recently developed for image classification and examines their impacts on noisy speech recognition performance. The first approach is to increase the depth of convolution layers. Different approaches to deepening the CNNs are compared. In particular, the usefulness of learning dynamic features with small convolution layers that perform convolution in time is shown along with a modulation frequency analysis of the learned convolution filters. The second approach is to use trainable activation functions. Specifically, the use of a Parametric Rectified Linear Unit (PReLU) is investigated. Experimental results show that both approaches yield significant improvements in performance. Combining the two approaches further reduces recognition errors, producing a word error rate of 11.1% in the Aurora4 task, the best published result for this corpus, with a standard one-pass bi-gram decoding set-up.

引用

页码：5730 / 5734

页数：5

共 50 条

[1] AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION
Seltzer, Michael L.
Yu, Dong
Wang, Yongqiang
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7398 - 7402
[2] Factored deep convolutional neural networks for noise robust speech recognition
Fujimoto, Masakiyo
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3837 - 3841
[3] Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
Qian, Yanmin
Bi, Mengxiao
Tan, Tian
Yu, Kai
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2263 - 2276
[4] Noise Robust Speech Recognition Using Deep Belief Networks
Farahat, Mahboubeh
Halavati, Ramin
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2016, 15 (01)
[5] A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
Li, Bo
Sim, Khe Chai
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) : 1296 - 1305
[6] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
Gupta, Santosh
Bhurchandi, Kishor M.
Keskar, Avinash G.
[J]. 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
[7] Speech recognition using stereo vision neural networks with competition and cooperation
Kim, SIII
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 333 - 338
[8] Robust speech recognition using fuzzy matrix quantisation and neural networks
Xydeas, CS
Lin, C
[J]. 1996 INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLUMES 1 AND 2 - PROCEEDINGS, 1996, : 432 - 435
[9] SPEECH SEPARATION BASED ON SIGNAL-NOISE-DEPENDENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
Tu, Yan-Hui
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 61 - 65
[10] Noise-robust speech recognition in mobile network based on convolution neural networks
Lallouani Bouchakour
Mohamed Debyeche
[J]. International Journal of Speech Technology, 2022, 25 : 269 - 277

← 1 2 3 4 5 →