Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

被引:142
|
作者
Zhang, Jialiang [1 ]
Li, Jing [1 ]
机构
[1] Univ Wisconsin, Dept Elect & Comp Engn, Madison, WI 53706 USA
关键词
D O I
10.1145/3020078.3021698
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to optimize the OpenCL kernels to efficiently utilize the flexible hardware resources in FPGA. Simply optimizing the OpenCL kernel code through various compiler options turns out insufficient to achieve desirable performance for both compute-intensive and data-intensive workloads such as convolutional neural networks. In this paper, we first propose an analytical performance model and apply it to perform an in-depth analysis on the resource requirement of CNN classifier kernels and available resources on modern FPGAs. We identify that the key performance bottleneck is the on-chip memory bandwidth. We propose a new kernel design to effectively address such bandwidth limitation and to provide an optimal balance between computation, on-chip, and off-chip memory access. As a case study, we further apply these techniques to design a CNN accelerator based on the VGG model. Finally, we evaluate the performance of our CNN accelerator using an Altera Arria 10 GX1150 board. We achieve 866 Gop/s floating point performance at 370MHz working frequency and 1:79 Top/s 16-bit fixed-point performance at 385MHz. To the best of our knowledge, our implementation achieves the best power efficiency and performance density compared to existing work.
引用
收藏
页码:25 / 34
页数:10
相关论文
共 50 条
  • [31] FPGA-Based Reconfigurable Convolutional Neural Network Accelerator Using Sparse and Convolutional Optimization
    Gowda, Kavitha Malali Vishveshwarappa
    Madhavan, Sowmya
    Rinaldi, Stefano
    Divakarachari, Parameshachari Bidare
    Atmakur, Anitha
    ELECTRONICS, 2022, 11 (10)
  • [32] An FPGA-Based Computation-Efficient Convolutional Neural Network Accelerator
    Archana, V. S.
    2022 IEEE INTERNATIONAL POWER AND RENEWABLE ENERGY CONFERENCE, IPRECON, 2022,
  • [33] Scalable FPGA-Based Convolutional Neural Network Accelerator for Embedded Systems
    Zhao, Jingyuan
    Yin, Zhendong
    Zhao, Yanlong
    Wu, Mingyang
    Xu, Mingdong
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2019), 2019, : 36 - 40
  • [34] Toward In-System Monitoring of OpenCL-Based Designs on FPGA
    Bensalem, Hachem
    Blaquiere, Yves
    Savaria, Yvon
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [35] Optimizing Convolutional Neural Network on FPGA under Heterogeneous Computing Framework with OpenCL
    Wang, Zhengrong
    Qiao, Fei
    Liu, Zhen
    Shan, Yuxiang
    Zhou, Xunyi
    Luo, Li
    Yang, Huazhong
    PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 3433 - 3438
  • [36] A convolutional neural network accelerator on FPGA for crystallography spot screening
    Jiang, Yuwei
    Feng, Yingqi
    Ren, Tao
    Zhu, Yongxin
    PROCEEDINGS OF THE 2024 IEEE 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC 2024, 2024, : 66 - 70
  • [37] Optimizing FPGA-Based Convolutional Neural Network Performance
    Kao, Chi-Chou
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (15)
  • [38] OpenCL-based Virtual Prototyping and Simulation of Many-Accelerator Architectures
    Sotiriou-Xanthopoulos, Efstathios
    Masing, Leonard
    Xydis, Sotirios
    Siozios, Kostas
    Becker, Juergen
    Soudris, Dimitrios
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (05)
  • [39] OpenCL-Based Performance Enhancement of Model Transformations
    Fekete, Tamas
    Mezei, Gergely
    IWOCL'18: PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON OPENCL, 2018, : 89 - 90
  • [40] Implementation of Data-optimized FPGA-based Accelerator for Convolutional Neural Network
    Cho, Mannhee
    Kim, Youngmin
    2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,