Optimizing OpenCL-Based CNN Design on FPGA with Comprehensive Design Space Exploration and Collaborative Performance Modeling

被引:9
|
作者
Mu, Jiandong [1 ]
Zhang, Wei [1 ]
Liang, Hao [2 ]
Sinha, Sharad [3 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Indian Inst Technol IIT, Veling, Goa, India
关键词
CNN; modeling; hardware design; design space exploration;
D O I
10.1145/3397514
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent success in applying convolutional neural networks (CNNs) to object detection and classification has sparked great interest in accelerating CNNs using hardware-like field-programmable gate arrays (FPGAs). However, finding an efficient FPGA design for a given CNN model and FPGA board is not trivial since a strong background in hardware design and detailed knowledge of the target board are required. In this work, we try to solve this problem by design space exploration with a collaborative framework. Our framework consists of three main parts: FPGA design generation, coarse-grained modeling, and fine-grained modeling. In the FPGA design generation, we propose a novel data structure, LoopTree, to capture the details of the FPGA design for CNN applications without writing down the source code. Different LoopTrees, which indicate different FPGA designs, are automatically generated in this process. A coarse-grained model will evaluate LoopTrees at the operation level, e.g., add, mult, and so on, so that the most efficient LoopTrees can be selected. A fine-grained model, which is based on the source code, will then refine the selected design in a cycle-accurate manner. A set of comprehensive OpenCL-based designs have been implemented on board to verify our framework. An average estimation error of 8.87% and 4.8% has been observed for our coarse-grained model and fine-grained model, respectively. This is much lower than the prevalent operation-statistics-based estimation, which is obtained according to a predefined formula for specific loop schedules.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] OpenCL-based design of an FPGA accelerator for quantum annealing simulation
    Hasitha Muthumala Waidyasooriya
    Masanori Hariyama
    Masamichi J. Miyama
    Masayuki Ohzeki
    The Journal of Supercomputing, 2019, 75 : 5019 - 5039
  • [2] OpenCL-based design of an FPGA accelerator for quantum annealing simulation
    Waidyasooriya, Hasitha Muthumala
    Hariyama, Masanori
    Miyama, Masamichi J.
    Ohzeki, Masayuki
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (08): : 5019 - 5039
  • [3] Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
    Tapiador-Morales, Ricardo
    Rios-Navarro, Antonio
    Linares-Barranco, Alejandro
    Kim, Minkyu
    Kadetotad, Deepak
    Seo, Jae-sun
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT II, 2017, 10306 : 271 - 282
  • [4] An OpenCL-Based FPGA Accelerator for Faster R-CNN
    An, Jianjing
    Zhang, Dezheng
    Xu, Ke
    Wang, Dong
    ENTROPY, 2022, 24 (10)
  • [5] An OpenCL-Based Hybrid CNN-RNN Inference Accelerator On FPGA
    Sun, Yunfei
    Liu, Brian
    Xu, Xianchao
    2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 283 - 286
  • [6] A Collaborative Framework for FPGA-based CNN Design Modeling and Optimization
    Mu, Jiandong
    Zhang, Wei
    Liang, Hao
    Sinha, Sharad
    2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, : 139 - 146
  • [7] OpenCL-based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes
    Kenter, Tobias
    Mahale, Gopinath
    Alhaddad, Samer
    Grynko, Yevgen
    Foerstner, Jens
    Plessl, Christian
    Schmitt, Christian
    Afzal, Ayesha
    Hannig, Frank
    PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, : 189 - 196
  • [8] OpenCL-Based Design of an FPGA Accelerator for H.266/VVC Transform and Quantization
    Waidyasooriya, Hasitha Muthumala
    Hariyama, Masanori
    Iwasaki, Hiroe
    Kobayashi, Daisuke
    Omori, Yuya
    Nakamura, Ken
    Nitta, Koyo
    Sano, Kimikazu
    2022 IEEE 65TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS 2022), 2022,
  • [9] Pareto optimal design space exploration for accelerated CNN on FPGA
    Reggiani, Enrico
    Rabozzi, Marco
    Nestorov, Anna Maria
    Scolari, Alberto
    Stornaiulo, Luca
    Santambrogio, Marco D.
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 107 - 114
  • [10] Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
    Zhang, Jialiang
    Li, Jing
    FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 25 - 34