IMPROVING LONG SHORT-TERM MEMORY NETWORKS USING MAXOUT UNITS FOR LARGE VOCABULARY SPEECH RECOGNITION

被引:0
|
作者
Li, Xiangang [1 ]
Wu, Xinhong [1 ]
机构
[1] Peking Univ, Speech & Hearing Res Ctr, Key Lab Machine Percept, Minist Educ, Beijing 100871, Peoples R China
关键词
long short-term memory; maxout; deep neural network; acoustic modeling; large vocabulary speech recognition; GRADIENT DESCENT; NEURAL-NETWORKS; RECURRENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Long short-term memory (LSTM) recurrent neural networks have been shown to give state-of-the-art performance on many speech recognition tasks. To achieve a further performance improvement, in this paper, maxout units are proposed to be integrated with the LSTM cells, considering those units have brought significant improvements to deep feed-forward neural networks. A novel architecture was constructed by replacing the input activation units (generally tanh) in the LSTM networks with maxout units. We implemented the LSTM network training on multi-GPU devices with truncated BPTT, and empirically evaluated the proposed designs on a large vocabulary Mandarin conversational telephone speech recognition task. The experimental results support our claim that the performance of LSTM based acoustic models can be further improved using the maxout units.
引用
收藏
页码:4600 / 4604
页数:5
相关论文
共 50 条
  • [1] Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3219 - 3223
  • [2] CONSTRUCTING LONG SHORT-TERM MEMORY BASED DEEP RECURRENT NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION
    Li, Xianggang
    Wu, Xihong
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4520 - 4524
  • [3] Deep Long Short-Term Memory Networks for Speech Recognition
    Chien, Jen-Tzung
    Misbullah, Alim
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [4] Long Short-Term Memory Networks for Noise Robust Speech Recognition
    Woellmer, Martin
    Sun, Yang
    Eyben, Florian
    Schuller, Bjoern
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2966 - 2969
  • [5] Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1086 - 1090
  • [6] Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition
    Chang, Shuo-Yiin
    Li, Bo
    Sainath, Tara N.
    Simko, Gabor
    Parada, Carolina
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3812 - 3816
  • [7] Long Short-term Memory for Tibetan Speech Recognition
    Wang, Weizhe
    Chen, Ziyan
    Yang, Hongwu
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1059 - 1063
  • [8] Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory
    Lasiman, Jeremia Jason
    Lestari, Dessi Puji
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2018, : 40 - 43
  • [9] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [10] Emotion Recognition From Speech and Text using Long Short-Term Memory
    Venkateswarlu, Sonagiri China
    Jeevakala, Siva Ramakrishna
    Kumar, Naluguru Udaya
    Munaswamy, Pidugu
    Pendyala, Dhanalaxmi
    [J]. ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2023, 13 (04) : 11166 - 11169