Optimizing Convolutional Neural Network on FPGA under Heterogeneous Computing Framework with OpenCL

被引：0

作者：

Wang, Zhengrong ^{[1
]}

Qiao, Fei ^{[1
]}

Liu, Zhen ^{[2
]}

Shan, Yuxiang ^{[3
]}

Zhou, Xunyi ^{[3
]}

Luo, Li ^{[2
]}

Yang, Huazhong ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

[2] Beijing Jiaotong Univ, Dept Elect Sci & Technol, Beijing, Peoples R China

[3] Samsung Telecom R&D Ctr, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON) | 2016年

关键词：

FPGA; OpenCL; heterogeneous computing; CNN;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As convolutional neural network (CNN) has been used more and more widely, such as in areas of images classifications and face recognition, the traditional CPU or GPU platforms have been insufficient to support the efficient operation of increasingly complex CNN. Therefore, heterogeneous computing platform is increasingly used to accelerate CNN, which contains a host and one or more computing devices, such as GPU and FPGA, etc. Due to its programmable hardware structure and high power efficient, FPGA is very promising in CNN acceleration. OpenCL is designed to provide a unified framework of heterogeneous computing platform for the industry. As FPGA vendors gradually began to support OpenCL, it is possible to use OpenCL on FPGA, which makes the development on FPGA much easier. This paper uses Xilinx SDAccel tool to explore how to accelerate CNN in OpenCL framework with FPGA, especially for CNN applications. Since convolutional layer is the most complex part of CNN, this paper first focuses on optimizations of a single convolution layer, then discusses the acceleration of a complete CNN where different optimization strategies are being explored. By using appropriate optimization measures, the computing speed of CNN on FPGA has improved. For convolutional layer, an improvement of speedup of 14.4X has been achieved. Moreover, the processing speed of some CNN network has been improved 2X with pipelined structure as well as the overall throughput is 48.5fps. Additionally, the utilization of the hardware resources of FPGA chip is less than 30%, which means the use of OpenCL on FPGA to accelerate CNN would be of great prospects with increasingly sophisticated tool chain, supporting FPGA hardware improvements and appropriate optimizations of CNN algorithms.

引用

页码：3433 / 3438

页数：6

共 50 条

[1] Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL
Luo, Li
Wu, Yakun
Qiao, Fei
Yang, Yi
Wei, Qi
Zhou, Xiaobo
Fan, Yongkai
Xu, Shuzheng
Liu, Xinjun
Yang, Huazhong
[J]. INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2018, 2018
[2] Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA
Qiao, Yuran
Shen, Junzhong
Huang, Dafei
Yang, Qianming
Wen, Mei
Zhang, Chunyuan
[J]. NETWORK AND PARALLEL COMPUTING (NPC 2017), 2017, 10578 : 100 - 111
[3] Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
Zhang, Jialiang
Li, Jing
[J]. FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 25 - 34
[4] Optimizing FPGA-Based Convolutional Neural Network Performance
Kao, Chi-Chou
[J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (15)
[5] FPGA Implementation of Convolutional Neural Network Based on Stochastic Computing
Kim, Daewoo
Moghaddam, Mansureh S.
Moradian, Hossein
Sim, Hyeonuk
Lee, Jongeun
Choi, Kiyoung
[J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 287 - 290
[6] Optimizing Performance of Convolutional Neural Network Using Computing Technique
Samudre, Pooja
Shende, Prashant
Jaiswal, Vishal
[J]. 2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
[7] Optimizing Convolutional Neural Network Accelerator on Low-Cost FPGA
Truong Quang Vinh
Dinh Viet Hai
[J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (11)
[8] A neural network framework for optimizing parallel computing in cloud servers
de Lima, Everton C.
Rossi, Fabio D.
Luizelli, Marcelo C.
Calheiros, Rodrigo N.
Lorenzon, Arthur F.
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 150
[9] Heterogeneous System Implementation of Deep Learning Neural Network for Object Detection in OpenCL Framework
Li, Shuai
Luo, Yukui
Sun, Kuangyuan
Choi, Ken
[J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 456 - 459
[10] Efficient FPGA-Based Convolutional Neural Network Implementation for Edge Computing
Cuong, Pham-Quoc
Thinh, Tran Ngoc
[J]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (03) : 479 - 487

← 1 2 3 4 5 →