Optimizing Convolutional Neural Network on FPGA under Heterogeneous Computing Framework with OpenCL

被引：0

作者：

Wang, Zhengrong ^{[1
]}

Qiao, Fei ^{[1
]}

Liu, Zhen ^{[2
]}

Shan, Yuxiang ^{[3
]}

Zhou, Xunyi ^{[3
]}

Luo, Li ^{[2
]}

Yang, Huazhong ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

[2] Beijing Jiaotong Univ, Dept Elect Sci & Technol, Beijing, Peoples R China

[3] Samsung Telecom R&D Ctr, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON) | 2016年

关键词：

FPGA; OpenCL; heterogeneous computing; CNN;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As convolutional neural network (CNN) has been used more and more widely, such as in areas of images classifications and face recognition, the traditional CPU or GPU platforms have been insufficient to support the efficient operation of increasingly complex CNN. Therefore, heterogeneous computing platform is increasingly used to accelerate CNN, which contains a host and one or more computing devices, such as GPU and FPGA, etc. Due to its programmable hardware structure and high power efficient, FPGA is very promising in CNN acceleration. OpenCL is designed to provide a unified framework of heterogeneous computing platform for the industry. As FPGA vendors gradually began to support OpenCL, it is possible to use OpenCL on FPGA, which makes the development on FPGA much easier. This paper uses Xilinx SDAccel tool to explore how to accelerate CNN in OpenCL framework with FPGA, especially for CNN applications. Since convolutional layer is the most complex part of CNN, this paper first focuses on optimizations of a single convolution layer, then discusses the acceleration of a complete CNN where different optimization strategies are being explored. By using appropriate optimization measures, the computing speed of CNN on FPGA has improved. For convolutional layer, an improvement of speedup of 14.4X has been achieved. Moreover, the processing speed of some CNN network has been improved 2X with pipelined structure as well as the overall throughput is 48.5fps. Additionally, the utilization of the hardware resources of FPGA chip is less than 30%, which means the use of OpenCL on FPGA to accelerate CNN would be of great prospects with increasingly sophisticated tool chain, supporting FPGA hardware improvements and appropriate optimizations of CNN algorithms.

引用

页码：3433 / 3438

页数：6

共 50 条

[41] Acceleration and Implementation of Convolutional Neural Network Based on FPGA
Wang, Enyi
Qiu, Dehui
[J]. PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), 2019, : 321 - 325
[42] A Framework for Evaluating and Optimizing FPGA-Based SoCs for Aerospace Computing
Wulf, Nicholas
George, Alan D.
Gordon-Ross, Ann
[J]. ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2016, 10 (01)
[43] SecureAD: A Secure Video Anomaly Detection Framework on Convolutional Neural Network in Edge Computing Environment
Cheng, Hang
Liu, Ximeng
Wang, Huaxiong
Fang, Yan
Wang, Meiqing
Zhao, Xiaopeng
[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (02) : 1413 - 1427
[44] Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators
Carreras, Marco
Deriu, Gianfranco
Raffo, Luigi
Benini, Luca
Meloni, Paolo
[J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2020, 10 (03) : 348 - 361
[45] A Convolutional Neural Network and Graph Convolutional Network Based Framework for AD Classification
Lin, Lan
Xiong, Min
Zhang, Ge
Kang, Wenjie
Sun, Shen
Wu, Shuicai
[J]. SENSORS, 2023, 23 (04)
[46] Accelerating Convolutional Neural Network Inference in Split Computing: An In-Network Computing Approach
Lee, Hochan
Ko, Haneul
Bae, Chanbin
Pack, Sangheon
[J]. 38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 773 - 776
[47] Computing the Stereo Matching Cost with a Convolutional Neural Network
Zbontar, Jure
LeCun, Yann
[J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 1592 - 1599
[48] Stochastic computing in convolutional neural network implementation: a review
Lee, Yang Yang
Halim, Zaini Abdul
[J]. PEERJ COMPUTER SCIENCE, 2020,
[49] Optimizing Stochastic Computing for Low Latency Inference of Convolutional Neural Networks
Chen, Zhiyuan
Ma, Yufei
Wang, Zhongfeng
[J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
[50] All Binarized Convolutional Neural Network and Its implementation on an FPGA
Shimoda, Masayuki
Sato, Shimpei
Nakahara, Hiroki
[J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 291 - 294

← 1 2 3 4 5 →