Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial

被引：1

作者：

Mao, Wendong ^{[1
]}

Wang, Meiqi ^{[1
]}

Xie, Xiaoru ^{[2
]}

Wu, Xiao ^{[2
]}

Wang, Zhongfeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Integrated Circuits, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China

[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210008, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 03期

关键词：

Hardware acceleration; sparsity; CNN; transformer; tutorial; deep learning; FLEXIBLE ACCELERATOR; NEURAL-NETWORKS; EFFICIENT; ARCHITECTURE;

D O I：

10.1109/TCSII.2023.3344681

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning technique is developed to compress the computational and memory-intensive DNNs. However, directly executing these sparse models on a common hardware accelerator can cause significant under-utilization, since invalid data resulting from the sparse patterns leads to unnecessary computations and irregular memory accesses. This brief analyzes the critical issues in accelerating sparse models, and provides an overview of typical hardware designs for various sparse DNNs, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformers. Following the overview, we give a practical guideline of designing efficient accelerators for sparse DNNs with qualitative metrics to evaluate hardware overhead under different cases. In addition, we highlight potential opportunities in terms of hardware/software/algorithm co-optimizations from the perspective of sparse DNN implementation, and provide insights into recent design trends for the efficient implementation of transformers with sparse attention, which facilitates large language model (LLM) deployments with high throughput and energy efficiency.

引用

页码：1708 / 1714

页数：7

共 50 条

[1] MSCA: A Multi-grained Sparse Convolution Accelerator for DNN Training
Mao, Yingchang
Liu, Qiang
Cheung, Ray C. C.
[J]. 2024 IEEE 35TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP 2024, 2024, : 34 - 35
[2] SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
Qin, Eric
Samajdar, Ananda
Kwon, Hyoukjun
Nadella, Vineet
Srinivasan, Sudarshan
Das, Dipankar
Kaul, Bharat
Krishna, Tushar
[J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 58 - 70
[3] Improving Hardware Efficiency of a Sparse Training Accelerator by Restructuring a Reduction Network
Shin, Banseok
Park, Sehun
Kung, Jaeha
[J]. 2023 21ST IEEE INTERREGIONAL NEWCAS CONFERENCE, NEWCAS, 2023,
[4] SaGNN: a Sample-based GNN Training and Inference Hardware Accelerator
Wang, Haoyang
Zhang, Shengbing
Feng, Kaijie
Wang, Miao
Yang, Zhao
[J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
[5] A Reconfigurable DNN Training Accelerator on FPGA
Lu, Jinming
Lin, Jun
Wang, Zhongfeng
[J]. 2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 94 - 99
[6] HBP: Hierarchically Balanced Pruning and Accelerator Co-Design for Efficient DNN Inference
Ren, Ao
Wang, Yuhao
Zhang, Tao
Shi, Jiaxing
Liu, Duo
Chen, Xianzhang
Tan, Yujuan
Xie, Yuan
[J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[7] Hardware Accelerator for Analytics of Sparse Data
Nurvitadhi, Eriko
Mishra, Asit
Wang, Yu
Venkatesh, Ganesh
Marr, Debbie
[J]. PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 1616 - 1621
[8] Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates
Li, Jun
Wang, Wei
Li, Wu-Jun
[J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
[9] Acceleration of DNN Training Regularization: Dropout Accelerator
Lee, Gunhee
Park, Hanmin
Ryu, Soojung
Lee, Hyuk-Jae
[J]. 2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
[10] Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC
Rios-Navarro, Antonio
Gutierrez-Galan, Daniel
Dominguez-Morales, Juan Pedro
Pinero-Fuentes, Enrique
Duran-Lopez, Lourdes
Tapiador-Morales, Ricardo
Dominguez-Morales, Manuel Jesus
[J]. ELECTRONICS, 2021, 10 (01) : 1 - 10

← 1 2 3 4 5 →