Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

被引:142
|
作者
Zhang, Jialiang [1 ]
Li, Jing [1 ]
机构
[1] Univ Wisconsin, Dept Elect & Comp Engn, Madison, WI 53706 USA
关键词
D O I
10.1145/3020078.3021698
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to optimize the OpenCL kernels to efficiently utilize the flexible hardware resources in FPGA. Simply optimizing the OpenCL kernel code through various compiler options turns out insufficient to achieve desirable performance for both compute-intensive and data-intensive workloads such as convolutional neural networks. In this paper, we first propose an analytical performance model and apply it to perform an in-depth analysis on the resource requirement of CNN classifier kernels and available resources on modern FPGAs. We identify that the key performance bottleneck is the on-chip memory bandwidth. We propose a new kernel design to effectively address such bandwidth limitation and to provide an optimal balance between computation, on-chip, and off-chip memory access. As a case study, we further apply these techniques to design a CNN accelerator based on the VGG model. Finally, we evaluate the performance of our CNN accelerator using an Altera Arria 10 GX1150 board. We achieve 866 Gop/s floating point performance at 370MHz working frequency and 1:79 Top/s 16-bit fixed-point performance at 385MHz. To the best of our knowledge, our implementation achieves the best power efficiency and performance density compared to existing work.
引用
收藏
页码:25 / 34
页数:10
相关论文
共 50 条
  • [1] Improving the Performance of Whale Optimization Algorithm through OpenCL-Based FPGA Accelerator
    Jiang, Qiangqiang
    Guo, Yuanjun
    Yang, Zhile
    Wang, Zheng
    Yang, Dongsheng
    Zhou, Xianyu
    COMPLEXITY, 2020, 2020
  • [2] Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
    Suda, Naveen
    Chandra, Vikas
    Dasika, Ganesh
    Mohanty, Abinash
    Ma, Yufei
    Vrudhula, Sarma
    Seo, Jae-Sun
    Cao, Yu
    PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, : 16 - 25
  • [3] OpenCL-based design of an FPGA accelerator for quantum annealing simulation
    Hasitha Muthumala Waidyasooriya
    Masanori Hariyama
    Masamichi J. Miyama
    Masayuki Ohzeki
    The Journal of Supercomputing, 2019, 75 : 5019 - 5039
  • [4] OpenCL-based design of an FPGA accelerator for quantum annealing simulation
    Waidyasooriya, Hasitha Muthumala
    Hariyama, Masanori
    Miyama, Masamichi J.
    Ohzeki, Masayuki
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (08): : 5019 - 5039
  • [5] An OpenCL-Based FPGA Accelerator for Compressed YOLOv2
    Yang, Anrong
    Li, Yuanhui
    Shu, Hongqiao
    Deng, Jianlin
    Ma, Chuanzhao
    Li, Zheng
    Wang, Qigang
    2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 235 - 238
  • [6] A Scalable OpenCL-Based FPGA Accelerator For YOLOv2
    Xu, Ke
    Wang, Xiaoyun
    Wang, Dong
    2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 317 - 317
  • [7] PipeCNN: An OpenCL-Based Open-Source FPGA Accelerator for Convolution Neural Networks
    Wang, Dong
    Xu, Ke
    Jiang, Diankun
    2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 279 - 282
  • [8] An OpenCL-Based FPGA Accelerator for Faster R-CNN
    An, Jianjing
    Zhang, Dezheng
    Xu, Ke
    Wang, Dong
    ENTROPY, 2022, 24 (10)
  • [9] An OpenCL-Based Hybrid CNN-RNN Inference Accelerator On FPGA
    Sun, Yunfei
    Liu, Brian
    Xu, Xianchao
    2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 283 - 286
  • [10] Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL
    Luo, Li
    Wu, Yakun
    Qiao, Fei
    Yang, Yi
    Wei, Qi
    Zhou, Xiaobo
    Fan, Yongkai
    Xu, Shuzheng
    Liu, Xinjun
    Yang, Huazhong
    INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2018, 2018