Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference

被引:46
|
作者
Jiang, Weiwen [1 ,2 ,3 ]
Sha, Edwin H-M [1 ]
Zhang, Xinyi [2 ]
Yang, Lei [2 ]
Zhuge, Qingfeng [1 ]
Shi, Yiyu [3 ]
Hu, Jingtong [2 ]
机构
[1] East China Normal Univ, Shanghai, Peoples R China
[2] Univ Pittsburgh, Pittsburgh, PA 15260 USA
[3] Univ Notre Dame, Notre Dame, IN 46556 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
FPGA; DNN inference; real-time; parallel computing;
D O I
10.1145/3358192
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Real-time Deep Neural Network (DNN) inference with low-latency requirement has become increasingly important for numerous applications in both cloud computing (e.g., Apple's Siri) and edge computing (e.g., Google/Waymo's driverless car). FPGA-based DNN accelerators have demonstrated both superior flexibility and performance; in addition, for real-time inference with low batch size, FPGA is expected to achieve further performance improvement. However, the performance gain from the single-FPGA design is obstructed by the limited on-chip resource. In this paper, we employ multiple FPGAs to cooperatively run DNNs with the objective of achieving super-linear speed-up against single-FPGA design. In implementing such systems, we found two barriers that hinder us from achieving the design goal: (1) the lack of a clear partition scheme for each DNN layer to fully exploit parallelism, and (2) the insufficient bandwidth between the off-chip memory and the accelerator due to the growing size of DNNs. To tackle these issues, we propose a general framework, "Super-LIP", which can support different kinds of DNNs. In this paper, we take Convolutional Neural Network (CNN) as a vehicle to illustrate Super-LIP. We first formulate an accurate system-level model to support the exploration of best partition schemes. Then, we develop a novel design methodology to effectively alleviate the heavy loads on memory bandwidth by moving traffic from memory bus to inter-FPGA links. We implement Super-LIP based on ZCU102 FPGA boards. Results demonstrate that Super-LIP with 2 FPGAs can achieve 3.48x speedup, compared to the state-of-the-art single-FPGA design. What is more, as the number of FPGAs scales up, the system latency can be further reduced while maintaining high energy efficiency.
引用
下载
收藏
页数:23
相关论文
共 50 条
  • [1] Real-Time Multi-FPGA Simulation of Energy Conversion Systems
    Milton, Matthew
    Benigni, Andrea
    Monti, Antonello
    IEEE TRANSACTIONS ON ENERGY CONVERSION, 2019, 34 (04) : 2198 - 2208
  • [2] Multi-FPGA Based Real-time Simulation System for Power Electronics
    Zhu J.
    Teng G.
    Qin Y.
    Hu H.
    Hu, Haibing (huhaibing@nuaa.edu.cn), 1600, Automation of Electric Power Systems Press (41): : 137 - 143
  • [3] A multi-FPGA architecture-based real-time TFM ultrasound imaging
    Njiki, Mickael
    Elouardi, Abdelhafid
    Bouaziz, Samir
    Casula, Olivier
    Roy, Olivier
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2019, 16 (02) : 505 - 521
  • [4] Real-Time Simulation of a More Electric Aircraft Using a multi-FPGA Architecture
    Rivard, Maxime
    Fallaha, Charles
    Yamane, Amine
    Paquin, Jean-Nicolas
    Hicar, Marek
    Lavoie, Claude J. P.
    IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 5760 - 5765
  • [5] A multi-FPGA architecture-based real-time TFM ultrasound imaging
    Mickael Njiki
    Abdelhafid Elouardi
    Samir Bouaziz
    Olivier Casula
    Olivier Roy
    Journal of Real-Time Image Processing, 2019, 16 : 505 - 521
  • [6] Toward the Predictability of Dynamic Real-Time DNN Inference
    Pang, Weiguang
    Jiang, Xu
    Lv, Mingsong
    Gao, Teng
    Liu, Di
    Yi, Wang
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (09) : 2849 - 2862
  • [7] Real-time scheduling of linear speedup parallel tasks
    Drozdowski, M
    INFORMATION PROCESSING LETTERS, 1996, 57 (01) : 35 - 40
  • [8] Design of Parallel Architecture for Multi-FPGA Based Real-time Simulator of Active Distribution Network
    Li P.
    Wang Z.
    Wang C.
    Fu X.
    Song Y.
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2019, 43 (08): : 174 - 182
  • [9] SNAVA-A real-time multi-FPGA multi- model spiking neural network simulation architecture
    Sripad, Athul
    Sanchez, Giovanny
    Zapata, Mireya
    Pirrone, Vito
    Dorta, Taho
    Cambria, Salvatore
    Marti, Albert
    Krishnamourthy, Karthikeyan
    Madrenas, Jordi
    NEURAL NETWORKS, 2018, 97 : 28 - 45
  • [10] NetCut: Real-Time DNN Inference Using Layer Removal
    Zandigohar, Mehrshad
    Erdogmus, Deniz
    Schirner, Gunar
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1845 - 1850