Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

被引:101
|
作者
Liang, Yun [1 ,2 ]
Lu, Liqiang [3 ]
Xiao, Qingcheng [3 ]
Yan, Shengen [4 ]
机构
[1] Peking Univ, Sch EECS, Beijing 100871, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Ctr Energy Efficient Comp & Applicat, Beijing 100871, Peoples R China
[4] SenseTime, Algorithm Platform Dept, Hong Kong, Peoples R China
基金
北京市自然科学基金;
关键词
Field programmable gate arrays; Convolution; Space exploration; Prediction algorithms; Transforms; Analytical models; Convolutional neural networks; Convolutional neural network (CNN); fast algorithm; fast Fourier transformation (FFT); field-programmable gate array (FPGA); Winograd; HIGH-LEVEL SYNTHESIS; PERFORMANCE;
D O I
10.1109/TCAD.2019.2897701
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, convolutional neural networks (CNNs) have become widely adopted for computer vision tasks. Field-programmable gate arrays (FPGAs) have been adequately explored as a promising hardware accelerator for CNNs due to its high performance, energy efficiency, and reconfigurability. However, prior FPGA solutions based on the conventional convolutional algorithm is often bounded by the computational capability of FPGAs (e.g., the number of DSPs). To address this problem, the feature maps are transformed to a special domain using fast algorithms to reduce the arithmetic complexity. Winograd and fast Fourier transformation (FFT), as fast algorithm representatives, first transform input data and filter to Winograd or frequency domain, then perform element-wise multiplication, and apply inverse transformation to get the final output. In this paper, we propose a novel architecture for implementing fast algorithms on FPGAs. Our design employs line buffer structure to effectively reuse the feature map data among different tiles. We also effectively pipeline the Winograd/FFT processing element (PE) engine and initiate multiple PEs through parallelization. Meanwhile, there exists a complex design space to explore. We propose an analytical model to predict the resource usage and the performance. Then, we use the model to guide a fast design space exploration. Experiments using the state-of-the-art CNNs demonstrate the best performance and energy efficiency on FPGAs. We achieve 854.6 and 2479.6 GOP/s for AlexNet and VGG16 on Xilinx ZCU102 platform using Winograd. We achieve 130.4 GOP/s for Resnet using Winograd and 201.1 GOP/s for YOLO using FFT on Xilinx ZC706 platform.
引用
收藏
页码:857 / 870
页数:14
相关论文
共 50 条
  • [1] Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs
    Lu, Liqiang
    Liang, Yun
    Xiao, Qingcheng
    Yan, Shengen
    [J]. 2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 101 - 108
  • [2] Fast Algorithms for Convolutional Neural Networks
    Lavin, Andrew
    Gray, Scott
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4013 - 4021
  • [3] Towards Design Space Exploration and Optimization of Fast Algorithms for Convolutional Neural Networks (CNNs) on FPGAs
    Ahmad, Afzal
    Pasha, Muhammad Adeel
    [J]. 2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1106 - 1111
  • [4] Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs
    Xiao, Qincheng
    Liang, Yun
    Lu, Liqiang
    Yan, Shengen
    Tai, Yu-Wing
    [J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [5] Fast convolutional neural networks on FPGAs with hls4ml
    Aarrestad, Thea
    Loncar, Vladimir
    Ghielmetti, Nicolo
    Pierini, Maurizio
    Summers, Sioni
    Ngadiuba, Jennifer
    Petersson, Christoffer
    Linander, Hampus
    Iiyama, Yutaro
    Di Guglielmo, Giuseppe
    Duarte, Javier
    Harris, Philip
    Rankin, Dylan
    Jindariani, Sergo
    Pedro, Kevin
    Nhan Tran
    Liu, Mia
    Kreinar, Edward
    Wu, Zhenbin
    Hoang, Duc
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2021, 2 (04):
  • [6] A fast and scalable architecture to run convolutional neural networks in low density FPGAs
    Vestias, Mario P.
    Duarte, Rui P.
    de Sousa, Jose T.
    Neto, Horacio C.
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2020, 77
  • [7] Fast Algorithms for Quaternion-Valued Convolutional Neural Networks
    Cariow, Aleksandr
    Cariowa, Galina
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (01) : 457 - 462
  • [8] Reconfigurable Convolutional Kernels for Neural Networks on FPGAs
    Hardieck, Martin
    Kumm, Martin
    Moeller, Konrad
    Zipf, Peter
    [J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 43 - 52
  • [9] Balancing Convolutional Neural Networks Pipeline in FPGAs
    Ferreira de Sousa, Mark Cappello
    de Abreu de Sousa, Miguel Angelo
    Del-Moral-Hernandez, Emilio
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 166 - 175
  • [10] Fast 2D Convolution Algorithms for Convolutional Neural Networks
    Cheng, Chao
    Parhi, Keshab K.
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (05) : 1678 - 1691