A High-Performance Accelerator for Large-Scale Convolutional Neural Networks

被引:17
|
作者
Sun, Fan [1 ]
Wang, Chao [1 ]
Gong, Lei [1 ]
Xu, Chongchong [1 ]
Zhang, Yiwei [1 ]
Lu, Yuntao [1 ]
Li, Xi [1 ]
Zhou, Xuehai [1 ]
机构
[1] USTC, Dept Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
10.1109/ISPA/IUCC.2017.00099
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks(CNNs) have been widely applied in various applications because of their ability to achieve accuracy close to or even better than human level perception. However, for large-scale CNN, the computation-intensive convolutional layers and memory-intensive fully connected layers have brought many challenges to the implementation of CNN on FPGA platform. In the existing implementations, the same parallelism strategy is used for the entire CNN model, such a "one size fits all" approach may result in resource utilization problem. To overcome this problem, this work proposes an FPGA-based accelerator, which consists of multiple processing elements(PEs), each is responsible for the computation of one layer in the network model. All the PEs are mapped on one chip so that different layers can work concurrently in a pipelined style. A methodology is proposed to maximize the throughput of the accelerator. In the fully connected layers, a pruning method is used to decrease the number of weights, which can save a lot of storage and computation. Moreover, a batch-based computing method is applied to the compressed data in order to reduce the required memory bandwidth. As a case study, we implement a large-scale CNN model, AlexNet, on VC707 Board which has a Xilinx FPGA chip Virtex-7 485T. The proposed accelerator can achieve a peak performance of 498.6 GOP/s and the power efficiency with the value of 21.3 GOP/s/W under 100MHz clock frequency, which outperforms previous approaches.
引用
收藏
页码:622 / 629
页数:8
相关论文
共 50 条
  • [1] A High Performance FPGA-based Accelerator for Large-Scale Convolutional Neural Networks
    Li, Huimin
    Fan, Xitian
    Jiao, Li
    Cao, Wei
    Zhou, Xuegong
    Wang, Lingli
    2016 26TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2016,
  • [2] On the Large-Scale Transferability of Convolutional Neural Networks
    Zheng, Liang
    Zhao, Yali
    Wang, Shengjin
    Wang, Jingdong
    Yang, Yi
    Tian, Qi
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING: PAKDD 2018 WORKSHOPS, 2018, 11154 : 27 - 39
  • [3] SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks
    Yuntao Lu
    Chao Wang
    Lei Gong
    Xuehai Zhou
    International Journal of Parallel Programming, 2018, 46 : 648 - 659
  • [4] SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks
    Lu, Yuntao
    Wang, Chao
    Gong, Lei
    Zhou, Xuehai
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (04) : 648 - 659
  • [5] Large-scale Video Classification with Convolutional Neural Networks
    Karpathy, Andrej
    Toderici, George
    Shetty, Sanketh
    Leung, Thomas
    Sukthankar, Rahul
    Fei-Fei, Li
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1725 - 1732
  • [6] A Small-Footprint Accelerator for Large-Scale Neural Networks
    Chen, Tianshi
    Zhang, Shijin
    Liu, Shaoli
    Du, Zidong
    Luo, Tao
    Gao, Yuan
    Liu, Junjie
    Wang, Dongsheng
    Wu, Chengyong
    Sun, Ninghui
    Chen, Yunji
    Temam, Olivier
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2015, 33 (02):
  • [7] Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
    Suda, Naveen
    Chandra, Vikas
    Dasika, Ganesh
    Mohanty, Abinash
    Ma, Yufei
    Vrudhula, Sarma
    Seo, Jae-Sun
    Cao, Yu
    PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, : 16 - 25
  • [8] High performance reconfigurable accelerator for deep convolutional neural networks
    Qiao R.
    Chen G.
    Gong G.
    Lu H.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2019, 46 (03): : 130 - 139
  • [9] Performance Comparison and Analysis for Large-Scale Crowd Counting Based on Convolutional Neural Networks
    Alotaibi, Reem
    Alzahrani, Bander
    Wang, Rui
    Alafif, Tarik
    Barnawi, Ahmed
    Hu, Long
    IEEE ACCESS, 2020, 8 : 204425 - 204432
  • [10] UNSUPERVISED CONVOLUTIONAL NEURAL NETWORKS FOR LARGE-SCALE IMAGE CLUSTERING
    Hsu, Chih-Chung
    Lin, Chia-Wen
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 390 - 394