AMAIX: A Generic Analytical Model for Deep Learning Accelerators

被引：2

作者：

Juenger, Lukas ^{[1
]}

Zurstrassen, Niko ^{[1
]}

Kogel, Tim ^{[2
]}

Keding, Holger ^{[2
]}

Leupers, Rainer ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Inst Commun Technol & Embedded Syst ICE, Aachen, Germany

[2] Synopsys GmbH, Aschheim, Germany

来源：

EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2020 | 2020年 / 12471卷

关键词：

Deep Learning Accelerators; Analytical models; Design space exploration; Roofline model;

D O I：

10.1007/978-3-030-60939-9_3

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years the growing popularity of Convolutional Neural Networks (CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerators (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN inference performance on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy of up to 88% and 98% respectively. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.

引用

页码：36 / 51

页数：16

共 50 条

[31] Deep Learning for Generic Object Detection: A Survey
Liu, Li
Ouyang, Wanli
Wang, Xiaogang
Fieguth, Paul
Chen, Jie
Liu, Xinwang
Pietikainen, Matti
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (02) : 261 - 318
[32] Deep Learning for Generic Object Detection: A Survey
Li Liu
Wanli Ouyang
Xiaogang Wang
Paul Fieguth
Jie Chen
Xinwang Liu
Matti Pietikäinen
International Journal of Computer Vision, 2020, 128 : 261 - 318
[33] Improving Deep Learning with Generic Data Augmentation
Taylor, Luke
Nitschke, Geoff
2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1542 - 1547
[34] A radiative transfer deep learning model coupled into WRF with a generic fortran torch adaptor
Mu, Bin
Chen, Lu
Yuan, Shijin
Qin, Bo
FRONTIERS IN EARTH SCIENCE, 2023, 11
[35] Delfos: deep learning model for prediction of solvation free energies in generic organic solvents
Lim, Hyuntae
Jung, YounJoon
CHEMICAL SCIENCE, 2019, 10 (36) : 8306 - 8315
[36] FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review
Shawahna, Ahmad
Sait, Sadiq M.
El-Maleh, Aiman
IEEE ACCESS, 2019, 7 : 7823 - 7859
[37] Deep Learning Inferencing with High-performance Hardware Accelerators
Kljucaric, Luke
George, Alan D.
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (04)
[38] Decomposable Architecture and Fault Mitigation Methodology for Deep Learning Accelerators
Huang, Ning-Chi
Yang, Min-Syue
Chang, Ya-Chu
Wu, Kai-Chiang
2023 24TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED, 2023, : 298 - 305
[39] The Progress and Trends of FPGA-Based Accelerators in Deep Learning
Wu Y.-X.
Liang K.
Liu Y.
Cui H.-M.
Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (11): : 2461 - 2480
[40] Co-designed Systems for Deep Learning Hardware Accelerators
Brooks, David M.
2018 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2018,

← 1 2 3 4 5 →