Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference

被引：46

作者：

Jiang, Weiwen ^{[1
,2
,3
]}

Sha, Edwin H-M ^{[1
]}

Zhang, Xinyi ^{[2
]}

Yang, Lei ^{[2
]}

Zhuge, Qingfeng ^{[1
]}

Shi, Yiyu ^{[3
]}

Hu, Jingtong ^{[2
]}

机构：

[1] East China Normal Univ, Shanghai, Peoples R China

[2] Univ Pittsburgh, Pittsburgh, PA 15260 USA

[3] Univ Notre Dame, Notre Dame, IN 46556 USA

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2019年 / 18卷 / 05期

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

FPGA; DNN inference; real-time; parallel computing;

D O I：

10.1145/3358192

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Real-time Deep Neural Network (DNN) inference with low-latency requirement has become increasingly important for numerous applications in both cloud computing (e.g., Apple's Siri) and edge computing (e.g., Google/Waymo's driverless car). FPGA-based DNN accelerators have demonstrated both superior flexibility and performance; in addition, for real-time inference with low batch size, FPGA is expected to achieve further performance improvement. However, the performance gain from the single-FPGA design is obstructed by the limited on-chip resource. In this paper, we employ multiple FPGAs to cooperatively run DNNs with the objective of achieving super-linear speed-up against single-FPGA design. In implementing such systems, we found two barriers that hinder us from achieving the design goal: (1) the lack of a clear partition scheme for each DNN layer to fully exploit parallelism, and (2) the insufficient bandwidth between the off-chip memory and the accelerator due to the growing size of DNNs. To tackle these issues, we propose a general framework, "Super-LIP", which can support different kinds of DNNs. In this paper, we take Convolutional Neural Network (CNN) as a vehicle to illustrate Super-LIP. We first formulate an accurate system-level model to support the exploration of best partition schemes. Then, we develop a novel design methodology to effectively alleviate the heavy loads on memory bandwidth by moving traffic from memory bus to inter-FPGA links. We implement Super-LIP based on ZCU102 FPGA boards. Results demonstrate that Super-LIP with 2 FPGAs can achieve 3.48x speedup, compared to the state-of-the-art single-FPGA design. What is more, as the number of FPGAs scales up, the system latency can be further reduced while maintaining high energy efficiency.

引用

下载

页数：23

共 50 条

[1] Real-Time Multi-FPGA Simulation of Energy Conversion Systems
Milton, Matthew
Benigni, Andrea
Monti, Antonello
IEEE TRANSACTIONS ON ENERGY CONVERSION, 2019, 34 (04) : 2198 - 2208
[2] Multi-FPGA Based Real-time Simulation System for Power Electronics
Zhu J.
Teng G.
Qin Y.
Hu H.
Hu, Haibing (huhaibing@nuaa.edu.cn), 1600, Automation of Electric Power Systems Press (41): : 137 - 143
[3] A multi-FPGA architecture-based real-time TFM ultrasound imaging
Njiki, Mickael
Elouardi, Abdelhafid
Bouaziz, Samir
Casula, Olivier
Roy, Olivier
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2019, 16 (02) : 505 - 521
[4] Real-Time Simulation of a More Electric Aircraft Using a multi-FPGA Architecture
Rivard, Maxime
Fallaha, Charles
Yamane, Amine
Paquin, Jean-Nicolas
Hicar, Marek
Lavoie, Claude J. P.
IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 5760 - 5765
[5] A multi-FPGA architecture-based real-time TFM ultrasound imaging
Mickael Njiki
Abdelhafid Elouardi
Samir Bouaziz
Olivier Casula
Olivier Roy
Journal of Real-Time Image Processing, 2019, 16 : 505 - 521
[6] Toward the Predictability of Dynamic Real-Time DNN Inference
Pang, Weiguang
Jiang, Xu
Lv, Mingsong
Gao, Teng
Liu, Di
Yi, Wang
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (09) : 2849 - 2862
[7] Real-time scheduling of linear speedup parallel tasks
Drozdowski, M
INFORMATION PROCESSING LETTERS, 1996, 57 (01) : 35 - 40
[8] Design of Parallel Architecture for Multi-FPGA Based Real-time Simulator of Active Distribution Network
Li P.
Wang Z.
Wang C.
Fu X.
Song Y.
Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2019, 43 (08): : 174 - 182
[9] SNAVA-A real-time multi-FPGA multi- model spiking neural network simulation architecture
Sripad, Athul
Sanchez, Giovanny
Zapata, Mireya
Pirrone, Vito
Dorta, Taho
Cambria, Salvatore
Marti, Albert
Krishnamourthy, Karthikeyan
Madrenas, Jordi
NEURAL NETWORKS, 2018, 97 : 28 - 45
[10] NetCut: Real-Time DNN Inference Using Layer Removal
Zandigohar, Mehrshad
Erdogmus, Deniz
Schirner, Gunar
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1845 - 1850

← 1 2 3 4 5 →