Binary neural networks for speech recognition

被引:14
|
作者
Qian, Yan-min [1 ,2 ]
Xiang, Xu [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
\ Speech recognition; Binary neural networks; Binary matrix multiplication; Knowledge distillation; Population count;
D O I
10.1631/FITEE.1800469
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep neural networks (DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage makes deep models difficult to directly deploy on low-power embedded devices. To alleviate this issue, structure sparseness and low precision fixed-point quantization have been applied widely. In this work, binary neural networks for speech recognition are developed to reduce the computational cost during the inference stage. A fast implementation of binary matrix multiplication is introduced. On modern central processing unit (CPU) and graphics processing unit (GPU) architectures, a 5-7 times speedup compared with full precision floatingpoint matrix multiplication can be achieved in real applications. Several kinds of binary neural networks and related model optimization algorithms are developed for large vocabulary continuous speech recognition acoustic modeling. In addition, to improve the accuracy of binary models, knowledge distillation from the normal full precision floating-point model to the compressed binary model is explored. Experiments on the standard Switchboard speech recognition task show that the proposed binary neural networks can deliver 3-4 times speedup over the normal full precision deep models. With the knowledge distillation from the normal floating-point models, the binary DNNs or binary convolutional neural networks (CNNs) can restrict the word error rate (WER) degradation to within 15.0%, compared to the normal full precision floating-point DNNs or CNNs, respectively. Particularly for the binary CNN with binarization only on the convolutional layers, the WER degradation is very small and is almost negligible with the proposed approach.
引用
收藏
页码:701 / 715
页数:15
相关论文
共 50 条
  • [31] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
    CHEN, WY
    LIAO, YF
    CHEN, SH
    [J]. PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
  • [32] Gaussian Process Neural Networks for Speech Recognition
    Lam, Max W. Y.
    Hu, Shoukang
    Xie, Xurong
    Liu, Shansong
    Yu, Jianwei
    Su, Rongfeng
    Liu, Xunying
    Meng, Helen
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1778 - 1782
  • [33] Parallel Training of Neural Networks for Speech Recognition
    Vesely, Karel
    Burget, Lukas
    Grezl, Frantisek
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 439 - 446
  • [34] Visual speech recognition by recurrent neural networks
    Rabi, G
    Lu, SW
    [J]. 1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
  • [35] AN ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS FOR SPEECH RECOGNITION
    Huang, Jui-Ting
    Li, Jinyu
    Gong, Yifan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4989 - 4993
  • [36] Unfolded Recurrent Neural Networks for Speech Recognition
    Saon, George
    Soltau, Hagen
    Emami, Ahmad
    Picheny, Michael
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 343 - 347
  • [37] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
  • [38] Emotion Recognition in Speech Using Neural Networks
    J. Nicholson
    K. Takahashi
    R. Nakatsu
    [J]. Neural Computing & Applications, 2000, 9 : 290 - 296
  • [39] Speech recognition using Elman neural networks
    Rothkrantz, LJM
    Nollen, D
    [J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 146 - 151
  • [40] Automatic Speech Recognition Based on Neural Networks
    Schlueter, Ralf
    Doetsch, Patrick
    Golik, Pavel
    Kitza, Markus
    Menne, Tobias
    Irie, Kazuki
    Tueske, Zoltan
    Zeyer, Albert
    [J]. SPEECH AND COMPUTER, 2016, 9811 : 3 - 17