Binary neural networks for speech recognition

被引：14

作者：

Qian, Yan-min ^{[1
,2
]}

Xiang, Xu ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai 200240, Peoples R China

来源：

FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING | 2019年 / 20卷 / 05期

基金：

中国国家自然科学基金;

关键词：

\ Speech recognition; Binary neural networks; Binary matrix multiplication; Knowledge distillation; Population count;

D O I：

10.1631/FITEE.1800469

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, deep neural networks (DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage makes deep models difficult to directly deploy on low-power embedded devices. To alleviate this issue, structure sparseness and low precision fixed-point quantization have been applied widely. In this work, binary neural networks for speech recognition are developed to reduce the computational cost during the inference stage. A fast implementation of binary matrix multiplication is introduced. On modern central processing unit (CPU) and graphics processing unit (GPU) architectures, a 5-7 times speedup compared with full precision floatingpoint matrix multiplication can be achieved in real applications. Several kinds of binary neural networks and related model optimization algorithms are developed for large vocabulary continuous speech recognition acoustic modeling. In addition, to improve the accuracy of binary models, knowledge distillation from the normal full precision floating-point model to the compressed binary model is explored. Experiments on the standard Switchboard speech recognition task show that the proposed binary neural networks can deliver 3-4 times speedup over the normal full precision deep models. With the knowledge distillation from the normal floating-point models, the binary DNNs or binary convolutional neural networks (CNNs) can restrict the word error rate (WER) degradation to within 15.0%, compared to the normal full precision floating-point DNNs or CNNs, respectively. Particularly for the binary CNN with binarization only on the convolutional layers, the WER degradation is very small and is almost negligible with the proposed approach.

引用

页码：701 / 715

页数：15

共 50 条

[31] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
CHEN, WY
LIAO, YF
CHEN, SH
[J]. PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
[32] Gaussian Process Neural Networks for Speech Recognition
Lam, Max W. Y.
Hu, Shoukang
Xie, Xurong
Liu, Shansong
Yu, Jianwei
Su, Rongfeng
Liu, Xunying
Meng, Helen
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1778 - 1782
[33] Parallel Training of Neural Networks for Speech Recognition
Vesely, Karel
Burget, Lukas
Grezl, Frantisek
[J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 439 - 446
[34] Visual speech recognition by recurrent neural networks
Rabi, G
Lu, SW
[J]. 1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
[35] AN ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS FOR SPEECH RECOGNITION
Huang, Jui-Ting
Li, Jinyu
Gong, Yifan
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4989 - 4993
[36] Unfolded Recurrent Neural Networks for Speech Recognition
Saon, George
Soltau, Hagen
Emami, Ahmad
Picheny, Michael
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 343 - 347
[37] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
Graves, Alex
Mohamed, Abdel-rahman
Hinton, Geoffrey
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
[38] Emotion Recognition in Speech Using Neural Networks
J. Nicholson
K. Takahashi
R. Nakatsu
[J]. Neural Computing & Applications, 2000, 9 : 290 - 296
[39] Speech recognition using Elman neural networks
Rothkrantz, LJM
Nollen, D
[J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 146 - 151
[40] Automatic Speech Recognition Based on Neural Networks
Schlueter, Ralf
Doetsch, Patrick
Golik, Pavel
Kitza, Markus
Menne, Tobias
Irie, Kazuki
Tueske, Zoltan
Zeyer, Albert
[J]. SPEECH AND COMPUTER, 2016, 9811 : 3 - 17

← 1 2 3 4 5 →