Small-Footprint Highway Deep Neural Networks for Speech Recognition

被引:13
|
作者
Lu, Liang [1 ]
Renals, Steve [2 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[2] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Deep learning; highway networks; small-footprint models; speech recognition; HIDDEN UNIT CONTRIBUTIONS;
D O I
10.1109/TASLP.2017.2698723
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art speech recognition systems typically employ neural network acoustic models. However, compared to Gaussian mixture models, deep neural network (DNN) based acoustic models often have many more model parameters, making it challenging for them to be deployed on resource-constrained platforms, such as mobile devices. In this paper, we study the application of the recently proposed highway deep neural network (HDNN) for training small-footprint acousticmodels. HDNNs are a depth-gated feedforward neural network, which include two types of gate functions to facilitate the information flow through different layers. Our study demonstrates that HDNNs aremore compact than regular DNNs for acoustic modeling, i.e., they can achieve comparable recognition accuracy with many fewer model parameters. Furthermore, HDNNs are more controllable than DNNs: The gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters. Finally, we showthat HDNNs aremore adaptable than DNNs. For example, simply updating the gate functions using adaptation data can result in considerable gains in accuracy. We demonstrate these aspects by experiments using the publicly available AMI corpus, which has around 80 h of training data.
引用
收藏
页码:1502 / 1511
页数:10
相关论文
共 50 条
  • [1] Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition
    Lu, Liang
    Renals, Steve
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 12 - 16
  • [2] Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition
    CHENG Gaofeng
    LI Xin
    YAN Yonghong
    [J]. Chinese Journal of Electronics, 2019, 28 (01) : 107 - 112
  • [3] Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition
    Cheng Gaofeng
    Li Xin
    Yan Yonghong
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2019, 28 (01) : 107 - 112
  • [4] KNOWLEDGE DISTILLATION FOR SMALL-FOOTPRINT HIGHWAY NETWORKS
    Lu, Liang
    Guo, Michelle
    Renals, Steve
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4820 - 4824
  • [5] SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
    Chen, Guoguo
    Parada, Carolina
    Heigold, Georg
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] PocketSUMMIT: Small-Footprint Continuous Speech Recognition
    Hetherington, I. Lee
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2173 - 2176
  • [7] Structure Growth for Small-Footprint Speech Recognition
    Wu, Jiayao
    Tang, Zhiyuan
    Wang, Dong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 461 - 465
  • [8] Convolutional Neural Networks for Small-footprint Keyword Spotting
    Sainath, Tara N.
    Parada, Carolina
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1478 - 1482
  • [9] Reduced Model Size Deep Convolutional Neural Networks for Small-Footprint Keyword Spotting
    Tsai, Tsung Han
    Lin, Xin Hui
    [J]. 2021 28TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (IEEE ICECS 2021), 2021,
  • [10] SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ
    Wang, Yongqiang
    Li, Jinyu
    Gong, Yifan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4984 - 4988