Binary neural networks for speech recognition

被引:14
|
作者
Qian, Yan-min [1 ,2 ]
Xiang, Xu [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interac, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, SpeechLab, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
\ Speech recognition; Binary neural networks; Binary matrix multiplication; Knowledge distillation; Population count;
D O I
10.1631/FITEE.1800469
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep neural networks (DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage makes deep models difficult to directly deploy on low-power embedded devices. To alleviate this issue, structure sparseness and low precision fixed-point quantization have been applied widely. In this work, binary neural networks for speech recognition are developed to reduce the computational cost during the inference stage. A fast implementation of binary matrix multiplication is introduced. On modern central processing unit (CPU) and graphics processing unit (GPU) architectures, a 5-7 times speedup compared with full precision floatingpoint matrix multiplication can be achieved in real applications. Several kinds of binary neural networks and related model optimization algorithms are developed for large vocabulary continuous speech recognition acoustic modeling. In addition, to improve the accuracy of binary models, knowledge distillation from the normal full precision floating-point model to the compressed binary model is explored. Experiments on the standard Switchboard speech recognition task show that the proposed binary neural networks can deliver 3-4 times speedup over the normal full precision deep models. With the knowledge distillation from the normal floating-point models, the binary DNNs or binary convolutional neural networks (CNNs) can restrict the word error rate (WER) degradation to within 15.0%, compared to the normal full precision floating-point DNNs or CNNs, respectively. Particularly for the binary CNN with binarization only on the convolutional layers, the WER degradation is very small and is almost negligible with the proposed approach.
引用
收藏
页码:701 / 715
页数:15
相关论文
共 50 条
  • [1] Binary neural networks for speech recognition
    Yan-min Qian
    Xu Xiang
    [J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20 : 701 - 715
  • [2] Binary Deep Neural Networks for Speech Recognition
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
  • [3] Review of Neural Networks for Speech Recognition
    Lippmann, Richard P.
    [J]. NEURAL COMPUTATION, 1989, 1 (01) : 1 - 38
  • [4] Neural networks used for speech recognition
    El-Ramly, SH
    Abdel-Kader, NS
    El-Adawi, R
    [J]. 2002 IEEE PROCEEDINGS OF THE NINETEENTH NATIONAL RADIO SCIENCE CONFERENCE, VOLS 1 AND 2, 2002, : 200 - 207
  • [5] Speech recognition with artificial neural networks
    Dede, Guelin
    Sazli, Murat Huesnue
    [J]. DIGITAL SIGNAL PROCESSING, 2010, 20 (03) : 763 - 768
  • [6] Residual Neural Networks for Speech Recognition
    Vydana, Hari Krishna
    Vuppala, Anil Kumar
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 543 - 547
  • [7] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [8] Speech Recognition with Temporal Neural Networks
    Lin, Payton
    Lyu, Dau-Cheng
    Chang, Yun-Fan
    Tsao, Yu
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 21 - 25
  • [9] Speech recognition using neural networks
    Khan, SU
    Sharma, G
    Rao, PRK
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY 2000, VOLS 1 AND 2, 2000, : 432 - 437