Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

被引:142
|
作者
Zhang, Jialiang [1 ]
Li, Jing [1 ]
机构
[1] Univ Wisconsin, Dept Elect & Comp Engn, Madison, WI 53706 USA
关键词
D O I
10.1145/3020078.3021698
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to optimize the OpenCL kernels to efficiently utilize the flexible hardware resources in FPGA. Simply optimizing the OpenCL kernel code through various compiler options turns out insufficient to achieve desirable performance for both compute-intensive and data-intensive workloads such as convolutional neural networks. In this paper, we first propose an analytical performance model and apply it to perform an in-depth analysis on the resource requirement of CNN classifier kernels and available resources on modern FPGAs. We identify that the key performance bottleneck is the on-chip memory bandwidth. We propose a new kernel design to effectively address such bandwidth limitation and to provide an optimal balance between computation, on-chip, and off-chip memory access. As a case study, we further apply these techniques to design a CNN accelerator based on the VGG model. Finally, we evaluate the performance of our CNN accelerator using an Altera Arria 10 GX1150 board. We achieve 866 Gop/s floating point performance at 370MHz working frequency and 1:79 Top/s 16-bit fixed-point performance at 385MHz. To the best of our knowledge, our implementation achieves the best power efficiency and performance density compared to existing work.
引用
收藏
页码:25 / 34
页数:10
相关论文
共 50 条
  • [41] Deep Neural Network Accelerator based on FPGA
    Thang Viet Huynh
    2017 4TH NAFOSTED CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2017, : 254 - 257
  • [42] A Convolutional Neural Network Accelerator Based on NVDLA
    Zhao, Kangjin
    Wang, Jing
    Zang, Di
    5TH INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND SYSTEMS, ICACS 2021, 2021, : 43 - 47
  • [43] FPGA Accelerator for Homomorphic Encrypted Sparse Convolutional Neural Network Inference
    Yang, Yang
    Kuppannagari, Sanmukh R.
    Kannan, Rajgopal
    Prasanna, Viktor K.
    2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 81 - 89
  • [44] Designing efficient accelerator of depthwise separable convolutional neural network on FPGA
    Ding, Wei
    Huang, Zeyu
    Huang, Zunkai
    Tian, Li
    Wang, Hui
    Feng, Songlin
    JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 97 : 278 - 286
  • [45] Optimizing Convolutional Neural Network Accelerator on Low-Cost FPGA
    Truong Quang Vinh
    Dinh Viet Hai
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (11)
  • [46] OpenCL-based FPGA Accelerator for Semi-Global Approximate String Matching Using Diagonal Bit-Vectors
    Castells-Rufas, David
    Marco-Sola, Santiago
    Aguado-Puig, Quim
    Espinosa-Morales, Antonio
    Carlos Moure, Juan
    Alvarez, Lluc
    Moreto, Miquel
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 174 - 178
  • [47] Optimizing OpenCL-Based CNN Design on FPGA with Comprehensive Design Space Exploration and Collaborative Performance Modeling
    Mu, Jiandong
    Zhang, Wei
    Liang, Hao
    Sinha, Sharad
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2020, 13 (03)
  • [48] FPGA-based Convolutional Neural Network Accelerator design using High Level Synthesize
    Ghaffari, Sina
    Sharifian, Saeed
    2016 2ND INTERNATIONAL CONFERENCE OF SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2016, : 29 - 34
  • [49] An energy-efficient convolutional neural network accelerator for speech classification based on FPGA and quantization
    Wen, Dong
    Jiang, Jingfei
    Dou, Yong
    Xu, Jinwei
    Xiao, Tao
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021, 3 (01) : 4 - 16
  • [50] A High Utilization FPGA-Based Accelerator for Variable-Scale Convolutional Neural Network
    Li, Xin
    Cai, Yujie
    Han, Jun
    Zeng, Xiaoyang
    2017 IEEE 12TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2017, : 944 - 947