Small-Footprint Highway Deep Neural Networks for Speech Recognition

被引：13

作者：

Lu, Liang ^{[1
]}

Renals, Steve ^{[2
]}

机构：

[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA

[2] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2017年 / 25卷 / 07期

基金：

英国工程与自然科学研究理事会;

关键词：

Deep learning; highway networks; small-footprint models; speech recognition; HIDDEN UNIT CONTRIBUTIONS;

D O I：

10.1109/TASLP.2017.2698723

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

State-of-the-art speech recognition systems typically employ neural network acoustic models. However, compared to Gaussian mixture models, deep neural network (DNN) based acoustic models often have many more model parameters, making it challenging for them to be deployed on resource-constrained platforms, such as mobile devices. In this paper, we study the application of the recently proposed highway deep neural network (HDNN) for training small-footprint acousticmodels. HDNNs are a depth-gated feedforward neural network, which include two types of gate functions to facilitate the information flow through different layers. Our study demonstrates that HDNNs aremore compact than regular DNNs for acoustic modeling, i.e., they can achieve comparable recognition accuracy with many fewer model parameters. Furthermore, HDNNs are more controllable than DNNs: The gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters. Finally, we showthat HDNNs aremore adaptable than DNNs. For example, simply updating the gate functions using adaptation data can result in considerable gains in accuracy. We demonstrate these aspects by experiments using the publicly available AMI corpus, which has around 80 h of training data.

引用

页码：1502 / 1511

页数：10

共 50 条

[21] Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification
Kao, Chieh-Chi
Sun, Ming
Gao, Yixin
Vitaladevuni, Shiv
Wang, Chao
[J]. INTERSPEECH 2019, 2019, : 2195 - 2199
[22] Node pruning based on Entropy of Weights and Node Activity for Small-footprint Acoustic Model based on Deep Neural Networks
Takeda, Ryu
Nakadai, Kazuhiro
Komatani, Kazunori
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1636 - 1640
[23] DEEP RESIDUAL LEARNING FOR SMALL-FOOTPRINT KEYWORD SPOTTING
Tang, Raphael
Lin, Jimmy
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5484 - 5488
[24] SMALL-FOOTPRINT CONVOLUTIONAL NEURAL NETWORK FOR SPOOFING DETECTION
Dinkel, Heinrich
Qian, Yanmin
Yu, Kai
[J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3086 - 3091
[25] Small-footprint machines
[J]. Modern Machine Shop, 2000, 72 (12)
[26] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
Cai, Meng
Shi, Yongzhe
Liu, Jia
[J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
[27] Deep Neural Networks in Russian Speech Recognition
Markovnikov, Nikita
Kipyatkova, Irina
Karpov, Alexey
Filchenkov, Andrey
[J]. ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
[28] Deep Segmental Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Deng, Li
Yu, Dong
Jiang, Hui
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
[29] Binary Deep Neural Networks for Speech Recognition
Xiang, Xu
Qian, Yanmin
Yu, Kai
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
[30] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
Graves, Alex
Mohamed, Abdel-rahman
Hinton, Geoffrey
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649

← 1 2 3 4 5 →