Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

被引：0

作者：

Luebeck, Konstantin ^{[1
]}

Jung, Alexander Louis-Ferdinand ^{[1
]}

Wedlich, Felix ^{[1
]}

Mueller, Mika Markus ^{[1
]}

Peccia, Federico Nicolas ^{[2
]}

Thoemmes, Felix ^{[2
]}

Steinmetz, Jannik ^{[1
]}

Biermaier, Valentin ^{[1
]}

Frischknecht, Adrian ^{[1
]}

Bernardo, Paul Palomero ^{[1
]}

Bringmann, Oliver ^{[1
]}

机构：

[1] Univ Tubingen, Embedded Syst, Tubingen, Baden Wurttembe, Germany

[2] FZI, Karlsruhe, Baden Wurttembe, Germany

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2025年 / 24卷 / 02期

关键词：

Deep neural networks; performance estimation; analytical model;

D O I：

10.1145/3715122

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN mapped onto systematically modeled and concisely described accelerator architectures. Using our accelerator architecture description method, we modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array. Together with DNN mappings for those modeled architectures, we perform a combined DNN/hardware dependency graph analysis, which enables us, in the best case, to evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup. We outperform regression and analytical models in terms of mean absolute percentage error (MAPE) compared with simulation results, while being several magnitudes faster than an RTL simulation.

引用

页数：32

共 50 条

[1] Work-in-Progress: Ultra-fast yet Accurate Performance Prediction for Deep Neural Network Accelerators
Luebeck, Konstantin
Jung, Alexander Louis-Ferdinand
Wedlich, Felix
Bringmann, Oliver
2022 INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURE, AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES 2022), 2022, : 27 - 28
[2] Automatic Kernel Generation for Large Language Models on Deep Learning Accelerators
Wang, Fuyu
Shen, Minghua
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[3] Fast Loosely-Timed Deep Neural Network Models with Accurate Memory Contention
Arasteh, Emad M.
Domer, Rainer
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (05)
[4] SEALing Neural Network Models in Encrypted Deep Learning Accelerators
Zuo, Pengfei
Hua, Yu
Liang, Ling
Xie, Xinfeng
Hu, Xing
Xie, Yuan
2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 1255 - 1260
[5] Automatic Tool for Fast Generation of Custom Convolutional Neural Networks Accelerators for FPGA
Rivera-Acosta, Miguel
Ortega-Cisneros, Susana
Rivera, Jorge
ELECTRONICS, 2019, 8 (06)
[6] Joint Protection Scheme for Deep Neural Network Hardware Accelerators and Models
Zhou, Jingbo
Zhang, Xinmiao
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (12) : 4518 - 4527
[7] SHE: A Fast and Accurate Deep Neural Network for Encrypted Data
Lou, Qian
Jiang, Lei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[8] Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators
Pogue, Trevor E.
Nicolici, Nicola
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (02) : 495 - 509
[9] Fast-AT: Fast Automatic Thumbnail Generation using Deep Neural Networks
Esmaeili, Seyed A.
Singh, Bharat
Davis, Larry S.
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4178 - 4186
[10] NNSim: A Past and Accurate SystemC/TLM Simulator for Deep Convolutional Neural Network Accelerators
Lee, Yi-Che
Hsu, Ting-Shuo
Chen, Chun-Tse
Liou, Jing-Jia
Lu, Juin-Ming
2019 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2019,

← 1 2 3 4 5 →