Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

被引：0

作者：

Luebeck, Konstantin ^{[1
]}

Jung, Alexander Louis-Ferdinand ^{[1
]}

Wedlich, Felix ^{[1
]}

Mueller, Mika Markus ^{[1
]}

Peccia, Federico Nicolas ^{[2
]}

Thoemmes, Felix ^{[2
]}

Steinmetz, Jannik ^{[1
]}

Biermaier, Valentin ^{[1
]}

Frischknecht, Adrian ^{[1
]}

Bernardo, Paul Palomero ^{[1
]}

Bringmann, Oliver ^{[1
]}

机构：

[1] Univ Tubingen, Embedded Syst, Tubingen, Baden Wurttembe, Germany

[2] FZI, Karlsruhe, Baden Wurttembe, Germany

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2025年 / 24卷 / 02期

关键词：

Deep neural networks; performance estimation; analytical model;

D O I：

10.1145/3715122

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN mapped onto systematically modeled and concisely described accelerator architectures. Using our accelerator architecture description method, we modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array. Together with DNN mappings for those modeled architectures, we perform a combined DNN/hardware dependency graph analysis, which enables us, in the best case, to evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup. We outperform regression and analytical models in terms of mean absolute percentage error (MAPE) compared with simulation results, while being several magnitudes faster than an RTL simulation.

引用

页数：32

共 50 条

[41] BenQ: Benchmarking Automated Quantization on Deep Neural Network Accelerators
Wei, Zheng
Zhang, Xingjun
Li, Jingbo
Ji, Zeyu
Wei, Jia
PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1479 - 1484
[42] Tango: A Deep Neural Network Benchmark Suite for Various Accelerators
Karki, Aajna
Keshava, Chethan Palangotu
Shivakumar, Spoorthi Mysore
Skow, Joshua
Hegde, Goutam Madhukeshwar
Jeon, Hyeran
2019 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2019, : 137 - 138
[43] Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey
Armeniakos, Giorgos
Zervakis, Georgios
Soudris, Dimitrios
Henkel, Joerg
ACM COMPUTING SURVEYS, 2023, 55 (04)
[44] LAMBDA: An Open Framework for Deep Neural Network Accelerators Simulation
Russo, Enrico
Palesi, Maurizio
Monteleone, Salvatore
Patti, Davide
Ascia, Giuseppe
Catania, Vincenzo
2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 161 - 166
[45] An overview memristor based hardware accelerators for deep neural network
Gokgoz, Baki
Gul, Fatih
Aydin, Tolga
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (09):
[46] AINoC: New Interconnect for Future Deep Neural Network Accelerators
Krichene, Hana
Prasad, Rohit
Mouhagir, Ayoub
DESIGN AND ARCHITECTURE FOR SIGNAL AND IMAGE PROCESSING, DASIP 2023, 2023, 13879 : 55 - 69
[47] Kernel Mapping Techniques for Deep Learning Neural Network Accelerators
Ozdemir, Sarp
Khasawneh, Mohammad
Rao, Smriti
Madden, Patrick H.
ISPD'22: PROCEEDINGS OF THE 2022 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN, 2022, : 21 - 28
[48] Optimizing Memory Efficiency for Deep Convolutional Neural Network Accelerators
Li, Xiaowei
Li, Jiajun
Yan, Guihai
JOURNAL OF LOW POWER ELECTRONICS, 2018, 14 (04) : 496 - 507
[49] A New Constant Coefficient Multiplier for Deep Neural Network Accelerators
Manoj, B. R.
Yaji, Jayashree S.
Raghuram, S.
2022 IEEE 3RD INTERNATIONAL CONFERENCE ON VLSI SYSTEMS, ARCHITECTURE, TECHNOLOGY AND APPLICATIONS, VLSI SATA, 2022,
[50] Fast Generation of High Throughput Customized Deep Learning Accelerators on FPGAs
Zeng, Hanqing
Zhang, Chi
Prasanna, Viktor
2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,

← 1 2 3 4 5 →