Optimizing Convolutional Neural Network on FPGA under Heterogeneous Computing Framework with OpenCL

被引:0
|
作者
Wang, Zhengrong [1 ]
Qiao, Fei [1 ]
Liu, Zhen [2 ]
Shan, Yuxiang [3 ]
Zhou, Xunyi [3 ]
Luo, Li [2 ]
Yang, Huazhong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Beijing Jiaotong Univ, Dept Elect Sci & Technol, Beijing, Peoples R China
[3] Samsung Telecom R&D Ctr, Beijing, Peoples R China
关键词
FPGA; OpenCL; heterogeneous computing; CNN;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As convolutional neural network (CNN) has been used more and more widely, such as in areas of images classifications and face recognition, the traditional CPU or GPU platforms have been insufficient to support the efficient operation of increasingly complex CNN. Therefore, heterogeneous computing platform is increasingly used to accelerate CNN, which contains a host and one or more computing devices, such as GPU and FPGA, etc. Due to its programmable hardware structure and high power efficient, FPGA is very promising in CNN acceleration. OpenCL is designed to provide a unified framework of heterogeneous computing platform for the industry. As FPGA vendors gradually began to support OpenCL, it is possible to use OpenCL on FPGA, which makes the development on FPGA much easier. This paper uses Xilinx SDAccel tool to explore how to accelerate CNN in OpenCL framework with FPGA, especially for CNN applications. Since convolutional layer is the most complex part of CNN, this paper first focuses on optimizations of a single convolution layer, then discusses the acceleration of a complete CNN where different optimization strategies are being explored. By using appropriate optimization measures, the computing speed of CNN on FPGA has improved. For convolutional layer, an improvement of speedup of 14.4X has been achieved. Moreover, the processing speed of some CNN network has been improved 2X with pipelined structure as well as the overall throughput is 48.5fps. Additionally, the utilization of the hardware resources of FPGA chip is less than 30%, which means the use of OpenCL on FPGA to accelerate CNN would be of great prospects with increasingly sophisticated tool chain, supporting FPGA hardware improvements and appropriate optimizations of CNN algorithms.
引用
收藏
页码:3433 / 3438
页数:6
相关论文
共 50 条
  • [1] Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL
    Luo, Li
    Wu, Yakun
    Qiao, Fei
    Yang, Yi
    Wei, Qi
    Zhou, Xiaobo
    Fan, Yongkai
    Xu, Shuzheng
    Liu, Xinjun
    Yang, Huazhong
    [J]. INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2018, 2018
  • [2] Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA
    Qiao, Yuran
    Shen, Junzhong
    Huang, Dafei
    Yang, Qianming
    Wen, Mei
    Zhang, Chunyuan
    [J]. NETWORK AND PARALLEL COMPUTING (NPC 2017), 2017, 10578 : 100 - 111
  • [3] Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
    Zhang, Jialiang
    Li, Jing
    [J]. FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 25 - 34
  • [4] Optimizing FPGA-Based Convolutional Neural Network Performance
    Kao, Chi-Chou
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (15)
  • [5] FPGA Implementation of Convolutional Neural Network Based on Stochastic Computing
    Kim, Daewoo
    Moghaddam, Mansureh S.
    Moradian, Hossein
    Sim, Hyeonuk
    Lee, Jongeun
    Choi, Kiyoung
    [J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 287 - 290
  • [6] Optimizing Performance of Convolutional Neural Network Using Computing Technique
    Samudre, Pooja
    Shende, Prashant
    Jaiswal, Vishal
    [J]. 2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [7] Optimizing Convolutional Neural Network Accelerator on Low-Cost FPGA
    Truong Quang Vinh
    Dinh Viet Hai
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (11)
  • [8] A neural network framework for optimizing parallel computing in cloud servers
    de Lima, Everton C.
    Rossi, Fabio D.
    Luizelli, Marcelo C.
    Calheiros, Rodrigo N.
    Lorenzon, Arthur F.
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 150
  • [9] Heterogeneous System Implementation of Deep Learning Neural Network for Object Detection in OpenCL Framework
    Li, Shuai
    Luo, Yukui
    Sun, Kuangyuan
    Choi, Ken
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 456 - 459
  • [10] Efficient FPGA-Based Convolutional Neural Network Implementation for Edge Computing
    Cuong, Pham-Quoc
    Thinh, Tran Ngoc
    [J]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (03) : 479 - 487