Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition

被引:7
|
作者
Cheng Gaofeng [1 ,2 ]
Li Xin [1 ,2 ]
Yan Yonghong [1 ,2 ]
机构
[1] Inst Acoust, Key Lab Speech Acoust & Content Understanding, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Long short-term memory; Highway connections; Small-footprint; Speech recognition; NEURAL-NETWORKS;
D O I
10.1049/cje.2018.11.008
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Long short-term memory RNNs (LSTM-RNNs) have shown great success in the Automatic speech recognition (ASR) field and have become the state-of-the-art acoustic model for time-sequence modeling tasks. However, it is still difficult to train deep LSTM-RNNs while keeping the parameter number small. We use the highway connections between memory cells in adjacent layers to train a small-footprint highway LSTM-RNNs (HLSTM-RNNs), which are deeper and thinner compared to conventional LSTM-RNNs. The experiments on the Switchboard (SWBD) indicate that we can train thinner and deeper HLSTM-RNNs with a smaller parameter number than the conventional 3-layer LSTM-RNNs and a lower Word error rate (WER) than the conventional one. Compared with the counterparts of small-footprint LSTM-RNNs, the small-footprint HLSTM-RNNs show greater reduction in WER.
引用
收藏
页码:107 / 112
页数:6
相关论文
共 24 条
  • [1] Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition
    CHENG Gaofeng
    LI Xin
    YAN Yonghong
    [J]. Chinese Journal of Electronics, 2019, 28 (01) : 107 - 112
  • [2] Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition
    Lu, Liang
    Renals, Steve
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 12 - 16
  • [3] Small-Footprint Highway Deep Neural Networks for Speech Recognition
    Lu, Liang
    Renals, Steve
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1502 - 1511
  • [4] Emotional Statistical Parametric Speech Synthesis Using LSTM-RNNs
    An, Shumin
    Ling, Zhenhua
    Dai, Lirong
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1563 - 1566
  • [5] PocketSUMMIT: Small-Footprint Continuous Speech Recognition
    Hetherington, I. Lee
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2173 - 2176
  • [6] Structure Growth for Small-Footprint Speech Recognition
    Wu, Jiayao
    Tang, Zhiyuan
    Wang, Dong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 461 - 465
  • [7] COMBINING MIXTURE WEIGHT PRUNING AND QUANTIZATION FOR SMALL-FOOTPRINT SPEECH RECOGNITION
    Huggins-Daines, David
    Rudnicky, Alexander I.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4189 - 4192
  • [8] A COMPREHENSIVE STUDY OF DEEP BIDIRECTIONAL LSTM RNNS FOR ACOUSTIC MODELING IN SPEECH RECOGNITION
    Zeyer, Albert
    Doetsch, Patrick
    Voigtlaender, Paul
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2462 - 2466
  • [9] SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ
    Wang, Yongqiang
    Li, Jinyu
    Gong, Yifan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4984 - 4988
  • [10] SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
    Chen, Guoguo
    Parada, Carolina
    Heigold, Georg
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,