Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA

被引:2
|
作者
Qiao, Yuran [1 ]
Shen, Junzhong [1 ]
Huang, Dafei [1 ]
Yang, Qianming [1 ]
Wen, Mei [1 ]
Zhang, Chunyuan [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Natl Key Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China
来源
基金
高等学校博士学科点专项科研基金; 国家高技术研究发展计划(863计划);
关键词
D O I
10.1007/978-3-319-68210-5_9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nowadays, the rapid growth of data across the Internet has provided sufficient labeled data to train deep structured artificial neural networks. While deeper structured networks bring about significant precision gains in many applications, they also pose an urgent demand for higher computation capacity at the expense of power consumption. To this end, various FPGA based deep neural network accelerators are proposed for higher performance and lower energy consumption. However, as a dilemma, the development cycle of FPGA application is much longer than that of CPU and GPU. Although FPGA vendors such as Altera and Xilinx have released OpenCL framework to ease the programming, tuning the OpenCL codes for desirable performance on FPGAs is still challenging. In this paper, we look into the OpenCL implementation of Convolutional Neural Network (CNN) on FPGA. By analysing the execution manners of a CPU/GPU oriented verision on FPGA, we find out the causes of performance difference between FPGA and CPU/GPU and locate the performance bottlenecks. According to our analysis, we put forward a corresponding optimization method focusing on external memory transfers. We implement a prototype system on an Altera Stratix V A7 FPGA, which brings a considerable 4.76x speed up to the original version. To the best of our knowledge, this implementation outperforms most of the previous OpenCL implementations on FPGA by a large margin.
引用
收藏
页码:100 / 111
页数:12
相关论文
共 50 条
  • [1] Optimizing Convolutional Neural Network on FPGA under Heterogeneous Computing Framework with OpenCL
    Wang, Zhengrong
    Qiao, Fei
    Liu, Zhen
    Shan, Yuxiang
    Zhou, Xunyi
    Luo, Li
    Yang, Huazhong
    [J]. PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 3433 - 3438
  • [2] Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks
    Dong, Yong
    Hu, Wei
    Wang, Yonghao
    Jiao, Qiang
    Chen, Shuang
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II, 2020, 12453 : 97 - 110
  • [3] Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
    Zhang, Jialiang
    Li, Jing
    [J]. FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 25 - 34
  • [4] Acceleration and Implementation of Convolutional Neural Network Based on FPGA
    Wang, Enyi
    Qiu, Dehui
    [J]. PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), 2019, : 321 - 325
  • [5] Design and Implementation of Configurable Convolutional Neural Network on FPGA
    Huynh Vinh Phu
    Tran Minh Tan
    Phan Van Men
    Nguyen Van Hieu
    Truong Van Cuong
    [J]. PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, : 298 - 302
  • [6] Optimizing FPGA-Based Convolutional Neural Network Performance
    Kao, Chi-Chou
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (15)
  • [7] FPGA implementation of epileptic seizure detection using semisupervised reduced deep convolutional neural network
    Sahani, Mrutyunjaya
    Rout, Susanta Kumar
    Dash, Pradipta Kishore
    [J]. APPLIED SOFT COMPUTING, 2021, 110
  • [8] Efficient multiquality super-resolution using a deep convolutional neural network for an FPGA implementation
    Kim, Min Beom
    Lee, Sanglyn
    Kim, Ilho
    Hong, Hee Jung
    Kim, Chang Gone
    Yoon, Soo Young
    [J]. JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 2020, 28 (05) : 428 - 439
  • [9] All Binarized Convolutional Neural Network and Its implementation on an FPGA
    Shimoda, Masayuki
    Sato, Shimpei
    Nakahara, Hiroki
    [J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 291 - 294
  • [10] Efficient FPGA Implementation of Local Binary Convolutional Neural Network
    Zhakatayev, Aidyn
    Lee, Jongeun
    [J]. 24TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2019), 2019, : 699 - 704