Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

被引:0
|
作者
Luebeck, Konstantin [1 ]
Jung, Alexander Louis-Ferdinand [1 ]
Wedlich, Felix [1 ]
Mueller, Mika Markus [1 ]
Peccia, Federico Nicolas [2 ]
Thoemmes, Felix [2 ]
Steinmetz, Jannik [1 ]
Biermaier, Valentin [1 ]
Frischknecht, Adrian [1 ]
Bernardo, Paul Palomero [1 ]
Bringmann, Oliver [1 ]
机构
[1] Univ Tubingen, Embedded Syst, Tubingen, Baden Wurttembe, Germany
[2] FZI, Karlsruhe, Baden Wurttembe, Germany
关键词
Deep neural networks; performance estimation; analytical model;
D O I
10.1145/3715122
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN mapped onto systematically modeled and concisely described accelerator architectures. Using our accelerator architecture description method, we modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array. Together with DNN mappings for those modeled architectures, we perform a combined DNN/hardware dependency graph analysis, which enables us, in the best case, to evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup. We outperform regression and analytical models in terms of mean absolute percentage error (MAPE) compared with simulation results, while being several magnitudes faster than an RTL simulation.
引用
收藏
页数:32
相关论文
共 50 条
  • [41] BenQ: Benchmarking Automated Quantization on Deep Neural Network Accelerators
    Wei, Zheng
    Zhang, Xingjun
    Li, Jingbo
    Ji, Zeyu
    Wei, Jia
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1479 - 1484
  • [42] Tango: A Deep Neural Network Benchmark Suite for Various Accelerators
    Karki, Aajna
    Keshava, Chethan Palangotu
    Shivakumar, Spoorthi Mysore
    Skow, Joshua
    Hegde, Goutam Madhukeshwar
    Jeon, Hyeran
    2019 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2019, : 137 - 138
  • [43] Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey
    Armeniakos, Giorgos
    Zervakis, Georgios
    Soudris, Dimitrios
    Henkel, Joerg
    ACM COMPUTING SURVEYS, 2023, 55 (04)
  • [44] LAMBDA: An Open Framework for Deep Neural Network Accelerators Simulation
    Russo, Enrico
    Palesi, Maurizio
    Monteleone, Salvatore
    Patti, Davide
    Ascia, Giuseppe
    Catania, Vincenzo
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 161 - 166
  • [45] An overview memristor based hardware accelerators for deep neural network
    Gokgoz, Baki
    Gul, Fatih
    Aydin, Tolga
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (09):
  • [46] AINoC: New Interconnect for Future Deep Neural Network Accelerators
    Krichene, Hana
    Prasad, Rohit
    Mouhagir, Ayoub
    DESIGN AND ARCHITECTURE FOR SIGNAL AND IMAGE PROCESSING, DASIP 2023, 2023, 13879 : 55 - 69
  • [47] Kernel Mapping Techniques for Deep Learning Neural Network Accelerators
    Ozdemir, Sarp
    Khasawneh, Mohammad
    Rao, Smriti
    Madden, Patrick H.
    ISPD'22: PROCEEDINGS OF THE 2022 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN, 2022, : 21 - 28
  • [48] Optimizing Memory Efficiency for Deep Convolutional Neural Network Accelerators
    Li, Xiaowei
    Li, Jiajun
    Yan, Guihai
    JOURNAL OF LOW POWER ELECTRONICS, 2018, 14 (04) : 496 - 507
  • [49] A New Constant Coefficient Multiplier for Deep Neural Network Accelerators
    Manoj, B. R.
    Yaji, Jayashree S.
    Raghuram, S.
    2022 IEEE 3RD INTERNATIONAL CONFERENCE ON VLSI SYSTEMS, ARCHITECTURE, TECHNOLOGY AND APPLICATIONS, VLSI SATA, 2022,
  • [50] Fast Generation of High Throughput Customized Deep Learning Accelerators on FPGAs
    Zeng, Hanqing
    Zhang, Chi
    Prasanna, Viktor
    2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,