Optimizing Convolutional Neural Network on FPGA under Heterogeneous Computing Framework with OpenCL

被引:0
|
作者
Wang, Zhengrong [1 ]
Qiao, Fei [1 ]
Liu, Zhen [2 ]
Shan, Yuxiang [3 ]
Zhou, Xunyi [3 ]
Luo, Li [2 ]
Yang, Huazhong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Dept Elect Sci & Technol, Beijing, Peoples R China
[3] Samsung Telecom R&D Ctr, Beijing, Peoples R China
关键词
FPGA; OpenCL; heterogeneous computing; CNN;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As convolutional neural network (CNN) has been used more and more widely, such as in areas of images classifications and face recognition, the traditional CPU or GPU platforms have been insufficient to support the efficient operation of increasingly complex CNN. Therefore, heterogeneous computing platform is increasingly used to accelerate CNN, which contains a host and one or more computing devices, such as GPU and FPGA, etc. Due to its programmable hardware structure and high power efficient, FPGA is very promising in CNN acceleration. OpenCL is designed to provide a unified framework of heterogeneous computing platform for the industry. As FPGA vendors gradually began to support OpenCL, it is possible to use OpenCL on FPGA, which makes the development on FPGA much easier. This paper uses Xilinx SDAccel tool to explore how to accelerate CNN in OpenCL framework with FPGA, especially for CNN applications. Since convolutional layer is the most complex part of CNN, this paper first focuses on optimizations of a single convolution layer, then discusses the acceleration of a complete CNN where different optimization strategies are being explored. By using appropriate optimization measures, the computing speed of CNN on FPGA has improved. For convolutional layer, an improvement of speedup of 14.4X has been achieved. Moreover, the processing speed of some CNN network has been improved 2X with pipelined structure as well as the overall throughput is 48.5fps. Additionally, the utilization of the hardware resources of FPGA chip is less than 30%, which means the use of OpenCL on FPGA to accelerate CNN would be of great prospects with increasingly sophisticated tool chain, supporting FPGA hardware improvements and appropriate optimizations of CNN algorithms.
引用
收藏
页码:3433 / 3438
页数:6
相关论文
共 50 条
  • [41] Acceleration and Implementation of Convolutional Neural Network Based on FPGA
    Wang, Enyi
    Qiu, Dehui
    [J]. PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), 2019, : 321 - 325
  • [42] A Framework for Evaluating and Optimizing FPGA-Based SoCs for Aerospace Computing
    Wulf, Nicholas
    George, Alan D.
    Gordon-Ross, Ann
    [J]. ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2016, 10 (01)
  • [43] SecureAD: A Secure Video Anomaly Detection Framework on Convolutional Neural Network in Edge Computing Environment
    Cheng, Hang
    Liu, Ximeng
    Wang, Huaxiong
    Fang, Yan
    Wang, Meiqing
    Zhao, Xiaopeng
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (02) : 1413 - 1427
  • [44] Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators
    Carreras, Marco
    Deriu, Gianfranco
    Raffo, Luigi
    Benini, Luca
    Meloni, Paolo
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2020, 10 (03) : 348 - 361
  • [45] A Convolutional Neural Network and Graph Convolutional Network Based Framework for AD Classification
    Lin, Lan
    Xiong, Min
    Zhang, Ge
    Kang, Wenjie
    Sun, Shen
    Wu, Shuicai
    [J]. SENSORS, 2023, 23 (04)
  • [46] Accelerating Convolutional Neural Network Inference in Split Computing: An In-Network Computing Approach
    Lee, Hochan
    Ko, Haneul
    Bae, Chanbin
    Pack, Sangheon
    [J]. 38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 773 - 776
  • [47] Computing the Stereo Matching Cost with a Convolutional Neural Network
    Zbontar, Jure
    LeCun, Yann
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 1592 - 1599
  • [48] Stochastic computing in convolutional neural network implementation: a review
    Lee, Yang Yang
    Halim, Zaini Abdul
    [J]. PEERJ COMPUTER SCIENCE, 2020,
  • [49] Optimizing Stochastic Computing for Low Latency Inference of Convolutional Neural Networks
    Chen, Zhiyuan
    Ma, Yufei
    Wang, Zhongfeng
    [J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [50] All Binarized Convolutional Neural Network and Its implementation on an FPGA
    Shimoda, Masayuki
    Sato, Shimpei
    Nakahara, Hiroki
    [J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 291 - 294