A Threshold Neuron Pruning for a Binarized Deep Neural Network on an FPGA

被引:14
|
作者
Fujii, Tomoya [1 ]
Sato, Shimpei [1 ]
Nakahara, Hiroki [1 ]
机构
[1] Tokyo Inst Technol, Dept Informat & Commun Engn, Tokyo 1528552, Japan
来源
关键词
machine learning; deep learning; pruning; FPGA;
D O I
10.1587/transinf.2017RCP0013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For a pre-trained deep convolutional neural network (CNN) for an embedded system, a high-speed and a low power consumption are required. In the former of the CNN, it consists of convolutional layers, while in the latter, it consists of fully connection layers. In the convolutional layer, the multiply accumulation operation is a bottleneck, while the fully connection layer, the memory access is a bottleneck. The binarized CNN has been proposed to realize many multiply accumulation circuit on the FPGA, thus, the convolutional layer can be done with a high-seed operation. However, even if we apply the binarization to the fully connection layer, the amount of memory was still a bottleneck. In this paper, we propose a neuron pruning technique which eliminates almost part of the weight memory, and we apply it to the fully connection layer on the binarized CNN. In that case, since the weight memory is realized by an on-chip memory on the FPGA, it achieves a high-speed memory access. To further reduce the memory size, we apply the retraining the CNN after neuron pruning. In this paper, we propose a sequential-input parallel-output fully connection layer circuit for the binarized fully connection layer, while proposing a streaming circuit for the binarized 2D convolutional layer. The experimental results showed that, by the neuron pruning, as for the fully connected layer on the VGG-11 CNN, the number of neurons was reduced by 39.8% with keeping the 99% baseline accuracy. We implemented the neuron pruning CNN on the Xilinx Inc. Zynq Zedboard. Compared with the ARM Cortex-A57, it was 1773.0 times faster, it dissipated 3.1 times lower power, and its performance per power efficiency was 5781.3 times better. Also, compared with the Maxwell GPU, it was 11.1 times faster, it dissipated 7.7 times lower power, and its performance per power efficiency was 84.1 times better. Thus, the binarized CNN on the FPGA is suitable for the embedded system.
引用
收藏
页码:376 / 386
页数:11
相关论文
共 50 条
  • [1] An FPGA Realization of a Deep Convolutional Neural Network Using a Threshold Neuron Pruning
    Fujii, Tomoya
    Sato, Simpei
    Nakahara, Hiroki
    Motomura, Masato
    [J]. APPLIED RECONFIGURABLE COMPUTING, 2017, 10216 : 268 - 280
  • [2] A Batch Normalization Free Binarized Convolutional Deep Neural Network on an FPGA
    Nakahara, Hiroki
    Yonekawa, Haruyoshi
    Iwamoto, Hisashi
    Motomura, Masato
    [J]. FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 290 - 290
  • [3] An FSCV Deep Neural Network: Development, Pruning, and Acceleration on an FPGA
    Zhang, Zhichao
    Oh, Yoonbae
    Adams, Scott D.
    Bennet, Kevin E.
    Kouzani, Abbas Z.
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (06) : 2248 - 2259
  • [4] FP-BNN: Binarized neural network on FPGA
    Liang, Shuang
    Yin, Shouyi
    Liu, Leibo
    Luk, Wayne
    Wei, Shaojun
    [J]. NEUROCOMPUTING, 2018, 275 : 1072 - 1086
  • [5] Binarized Depthwise Separable Neural Network for Object Tracking in FPGA
    Yang, Li
    He, Zhezhi
    Fan, Deliang
    [J]. GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI, 2019, : 347 - 350
  • [6] All Binarized Convolutional Neural Network and Its implementation on an FPGA
    Shimoda, Masayuki
    Sato, Shimpei
    Nakahara, Hiroki
    [J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 291 - 294
  • [7] Implementing Binarized Neural Network Processor on FPGA-Based Platform
    Lee, Jeahack
    Kim, Hyeonseong
    Kim, Byung-Soo
    Jeon, Seokhun
    Lee, Jung Chul
    Kim, Dong Sun
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 469 - 471
  • [8] FPGA based Implementation of Binarized Neural Network for Sign Language Application
    Jaiswal, Mohita
    Sharma, Vaidehi
    Sharma, Abhishek
    Saini, Sandeep
    Tomar, Raghuvir
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2021), 2021, : 303 - 306
  • [9] A Fully Connected Layer Elimination for a Binarized Convolutional Neural Network on an FPGA
    Nakahara, Hiroki
    Fujii, Tomoya
    Sato, Shimpei
    [J]. 2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [10] Pruning by explaining: A novel criterion for deep neural network pruning
    Yeom, Seul-Ki
    Seegerer, Philipp
    Lapuschkin, Sebastian
    Binder, Alexander
    Wiedemann, Simon
    Mueller, Klaus-Robert
    Samek, Wojciech
    [J]. PATTERN RECOGNITION, 2021, 115