Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

被引：101

作者：

Liang, Yun ^{[1
,2
]}

Lu, Liqiang ^{[3
]}

Xiao, Qingcheng ^{[3
]}

Yan, Shengen ^{[4
]}

机构：

[1] Peking Univ, Sch EECS, Beijing 100871, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China

[3] Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing 100871, Peoples R China

[4] SenseTime, Algorithm Platform Dept, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2020年 / 39卷 / 04期

基金：

北京市自然科学基金;

关键词：

Field programmable gate arrays; Convolution; Space exploration; Prediction algorithms; Transforms; Analytical models; Convolutional neural networks; Convolutional neural network (CNN); fast algorithm; fast Fourier transformation (FFT); field-programmable gate array (FPGA); Winograd; HIGH-LEVEL SYNTHESIS; PERFORMANCE;

D O I：

10.1109/TCAD.2019.2897701

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, convolutional neural networks (CNNs) have become widely adopted for computer vision tasks. Field-programmable gate arrays (FPGAs) have been adequately explored as a promising hardware accelerator for CNNs due to its high performance, energy efficiency, and reconfigurability. However, prior FPGA solutions based on the conventional convolutional algorithm is often bounded by the computational capability of FPGAs (e.g., the number of DSPs). To address this problem, the feature maps are transformed to a special domain using fast algorithms to reduce the arithmetic complexity. Winograd and fast Fourier transformation (FFT), as fast algorithm representatives, first transform input data and filter to Winograd or frequency domain, then perform element-wise multiplication, and apply inverse transformation to get the final output. In this paper, we propose a novel architecture for implementing fast algorithms on FPGAs. Our design employs line buffer structure to effectively reuse the feature map data among different tiles. We also effectively pipeline the Winograd/FFT processing element (PE) engine and initiate multiple PEs through parallelization. Meanwhile, there exists a complex design space to explore. We propose an analytical model to predict the resource usage and the performance. Then, we use the model to guide a fast design space exploration. Experiments using the state-of-the-art CNNs demonstrate the best performance and energy efficiency on FPGAs. We achieve 854.6 and 2479.6 GOP/s for AlexNet and VGG16 on Xilinx ZCU102 platform using Winograd. We achieve 130.4 GOP/s for Resnet using Winograd and 201.1 GOP/s for YOLO using FFT on Xilinx ZC706 platform.

引用

页码：857 / 870

页数：14

共 50 条

[1] Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs
Lu, Liqiang
Liang, Yun
Xiao, Qingcheng
Yan, Shengen
[J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 101 - 108
[2] Fast Algorithms for Convolutional Neural Networks
Lavin, Andrew
Gray, Scott
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4013 - 4021
[3] Towards Design Space Exploration and Optimization of Fast Algorithms for Convolutional Neural Networks (CNNs) on FPGAs
Ahmad, Afzal
Pasha, Muhammad Adeel
[J]. 2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1106 - 1111
[4] Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs
Xiao, Qincheng
Liang, Yun
Lu, Liqiang
Yan, Shengen
Tai, Yu-Wing
[J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
[5] Fast convolutional neural networks on FPGAs with hls4ml
Aarrestad, Thea
Loncar, Vladimir
Ghielmetti, Nicolo
Pierini, Maurizio
Summers, Sioni
Ngadiuba, Jennifer
Petersson, Christoffer
Linander, Hampus
Iiyama, Yutaro
Di Guglielmo, Giuseppe
Duarte, Javier
Harris, Philip
Rankin, Dylan
Jindariani, Sergo
Pedro, Kevin
Nhan Tran
Liu, Mia
Kreinar, Edward
Wu, Zhenbin
Hoang, Duc
[J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2021, 2 (04):
[6] A fast and scalable architecture to run convolutional neural networks in low density FPGAs
Vestias, Mario P.
Duarte, Rui P.
de Sousa, Jose T.
Neto, Horacio C.
[J]. MICROPROCESSORS AND MICROSYSTEMS, 2020, 77
[7] Fast Algorithms for Quaternion-Valued Convolutional Neural Networks
Cariow, Aleksandr
Cariowa, Galina
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 457 - 462
[8] Reconfigurable Convolutional Kernels for Neural Networks on FPGAs
Hardieck, Martin
Kumm, Martin
Moeller, Konrad
Zipf, Peter
[J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 43 - 52
[9] Balancing Convolutional Neural Networks Pipeline in FPGAs
Ferreira de Sousa, Mark Cappello
de Abreu de Sousa, Miguel Angelo
Del-Moral-Hernandez, Emilio
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 166 - 175
[10] Fast 2D Convolution Algorithms for Convolutional Neural Networks
Cheng, Chao
Parhi, Keshab K.
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (05) : 1678 - 1691

← 1 2 3 4 5 →