NOISE ROBUST SPEECH RECOGNITION USING RECENT DEVELOPMENTS IN NEURAL NETWORKS FOR COMPUTER VISION

被引:0
|
作者
Yoshioka, Takuya [1 ]
Ohnishi, Katsunori [1 ,2 ]
Fang, Fuming [1 ,3 ]
Nakatani, Toniohiro [1 ]
机构
[1] NTT Corp, Tokyo, Tokyo, Japan
[2] Univ Tokyo, Tokyo 1138654, Japan
[3] Tokyo Inst Technol, Tokyo, Japan
关键词
Automatic speech recognition; noise robustness; convolutional neural network; parametric rectified linear unit;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Convolutional Neural Networks (CNNs) are superior to fully connected neural networks in various speech recognition tasks and the advantage is pronounced in noisy environments. In recent years, many techniques have been proposed in the computer vision community to improve CNN's classification performance. This paper considers two approaches recently developed for image classification and examines their impacts on noisy speech recognition performance. The first approach is to increase the depth of convolution layers. Different approaches to deepening the CNNs are compared. In particular, the usefulness of learning dynamic features with small convolution layers that perform convolution in time is shown along with a modulation frequency analysis of the learned convolution filters. The second approach is to use trainable activation functions. Specifically, the use of a Parametric Rectified Linear Unit (PReLU) is investigated. Experimental results show that both approaches yield significant improvements in performance. Combining the two approaches further reduces recognition errors, producing a word error rate of 11.1% in the Aurora4 task, the best published result for this corpus, with a standard one-pass bi-gram decoding set-up.
引用
收藏
页码:5730 / 5734
页数:5
相关论文
共 50 条
  • [1] AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION
    Seltzer, Michael L.
    Yu, Dong
    Wang, Yongqiang
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7398 - 7402
  • [2] Factored deep convolutional neural networks for noise robust speech recognition
    Fujimoto, Masakiyo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3837 - 3841
  • [3] Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
    Qian, Yanmin
    Bi, Mengxiao
    Tan, Tian
    Yu, Kai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2263 - 2276
  • [4] Noise Robust Speech Recognition Using Deep Belief Networks
    Farahat, Mahboubeh
    Halavati, Ramin
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2016, 15 (01)
  • [5] A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
    Li, Bo
    Sim, Khe Chai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) : 1296 - 1305
  • [6] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
    Gupta, Santosh
    Bhurchandi, Kishor M.
    Keskar, Avinash G.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
  • [7] Speech recognition using stereo vision neural networks with competition and cooperation
    Kim, SIII
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 333 - 338
  • [8] Robust speech recognition using fuzzy matrix quantisation and neural networks
    Xydeas, CS
    Lin, C
    [J]. 1996 INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLUMES 1 AND 2 - PROCEEDINGS, 1996, : 432 - 435
  • [9] SPEECH SEPARATION BASED ON SIGNAL-NOISE-DEPENDENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 61 - 65
  • [10] Noise-robust speech recognition in mobile network based on convolution neural networks
    Lallouani Bouchakour
    Mohamed Debyeche
    [J]. International Journal of Speech Technology, 2022, 25 : 269 - 277