Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition

被引:0
|
作者
Li, Xiangang [1 ]
Wu, Xihong [1 ]
机构
[1] Peking Univ, Speech & Hearing Res Ctr, Key Lab Machine Percept, Minist Educ, Beijing 100871, Peoples R China
关键词
speech recognition; long short-term memory; recurrent neural network; convolutional neural networks; GRADIENT DESCENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all sequence history. On the other hand, the convolutional neural networks (CNNs) have brought significant improvements to deep feed-forward neural networks (FFNNs), as they are able to better reduce spectral variation in the input signal. In this paper, a network architecture called as convolutional recurrent neural network (CRNN) is proposed by combining the CNN and LSTM RNN. In the proposed CRNNs, each speech frame, without adjacent context frames, is organized as a number of local feature patches along the frequency axis, and then a LSTM network is performed on each feature patch along the time axis. We train and compare FFNNs, LSTM RNNs and the proposed LSTM CRNNs at various number of configurations. Experimental results show that the LSTM CRNNs can exceed state-of-the-art speech recognition performance.
引用
收藏
页码:3219 / 3223
页数:5
相关论文
共 50 条
  • [1] CONSTRUCTING LONG SHORT-TERM MEMORY BASED DEEP RECURRENT NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION
    Li, Xianggang
    Wu, Xihong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4520 - 4524
  • [2] An analysis of Convolutional Long Short-Term Memory Recurrent Neural Networks for gesture recognition
    Tsironi, Eleni
    Barros, Pablo
    Weber, Cornelius
    Wermter, Stefan
    [J]. NEUROCOMPUTING, 2017, 268 : 76 - 86
  • [3] Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 718 - 726
  • [4] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [5] Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks
    Voigtlaender, Paul
    Doetsch, Patrick
    Ney, Hermann
    [J]. PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 228 - 233
  • [6] IMPROVING LONG SHORT-TERM MEMORY NETWORKS USING MAXOUT UNITS FOR LARGE VOCABULARY SPEECH RECOGNITION
    Li, Xiangang
    Wu, Xinhong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4600 - 4604
  • [7] Multilingual Convolutional, Long Short-Term Memory, Deep Neural Networks for Low Resource Speech Recognition
    Bukhari, Danish
    Wang, Yutian
    Wang, Hui
    [J]. ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 107 : 842 - 847
  • [8] Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
    Oruh, Jane
    Viriri, Serestina
    Adegun, Adekanmi
    [J]. IEEE ACCESS, 2022, 10 : 30069 - 30079
  • [9] Detecting Overlapping Speech with Long Short-Term Memory Recurrent Neural Networks
    Geiger, Juergen T.
    Eyben, Florian
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1667 - 1671
  • [10] Toward Transportation Mode Recognition Using Deep Convolutional and Long Short-Term Memory Recurrent Neural Networks
    Qin, Yanjun
    Luo, Haiyong
    Zhao, Fang
    Wang, Chenxing
    Wang, Jiaqi
    Zhang, Yuexia
    [J]. IEEE ACCESS, 2019, 7 : 142353 - 142367