Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial

被引:1
|
作者
Mao, Wendong [1 ]
Wang, Meiqi [1 ]
Xie, Xiaoru [2 ]
Wu, Xiao [2 ]
Wang, Zhongfeng [1 ]
机构
[1] Sun Yat Sen Univ, Sch Integrated Circuits, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China
[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210008, Peoples R China
关键词
Hardware acceleration; sparsity; CNN; transformer; tutorial; deep learning; FLEXIBLE ACCELERATOR; NEURAL-NETWORKS; EFFICIENT; ARCHITECTURE;
D O I
10.1109/TCSII.2023.3344681
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning technique is developed to compress the computational and memory-intensive DNNs. However, directly executing these sparse models on a common hardware accelerator can cause significant under-utilization, since invalid data resulting from the sparse patterns leads to unnecessary computations and irregular memory accesses. This brief analyzes the critical issues in accelerating sparse models, and provides an overview of typical hardware designs for various sparse DNNs, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformers. Following the overview, we give a practical guideline of designing efficient accelerators for sparse DNNs with qualitative metrics to evaluate hardware overhead under different cases. In addition, we highlight potential opportunities in terms of hardware/software/algorithm co-optimizations from the perspective of sparse DNN implementation, and provide insights into recent design trends for the efficient implementation of transformers with sparse attention, which facilitates large language model (LLM) deployments with high throughput and energy efficiency.
引用
收藏
页码:1708 / 1714
页数:7
相关论文
共 50 条
  • [1] MSCA: A Multi-grained Sparse Convolution Accelerator for DNN Training
    Mao, Yingchang
    Liu, Qiang
    Cheung, Ray C. C.
    [J]. 2024 IEEE 35TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP 2024, 2024, : 34 - 35
  • [2] SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
    Qin, Eric
    Samajdar, Ananda
    Kwon, Hyoukjun
    Nadella, Vineet
    Srinivasan, Sudarshan
    Das, Dipankar
    Kaul, Bharat
    Krishna, Tushar
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 58 - 70
  • [3] Improving Hardware Efficiency of a Sparse Training Accelerator by Restructuring a Reduction Network
    Shin, Banseok
    Park, Sehun
    Kung, Jaeha
    [J]. 2023 21ST IEEE INTERREGIONAL NEWCAS CONFERENCE, NEWCAS, 2023,
  • [4] SaGNN: a Sample-based GNN Training and Inference Hardware Accelerator
    Wang, Haoyang
    Zhang, Shengbing
    Feng, Kaijie
    Wang, Miao
    Yang, Zhao
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [5] A Reconfigurable DNN Training Accelerator on FPGA
    Lu, Jinming
    Lin, Jun
    Wang, Zhongfeng
    [J]. 2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 94 - 99
  • [6] HBP: Hierarchically Balanced Pruning and Accelerator Co-Design for Efficient DNN Inference
    Ren, Ao
    Wang, Yuhao
    Zhang, Tao
    Shi, Jiaxing
    Liu, Duo
    Chen, Xianzhang
    Tan, Yujuan
    Xie, Yuan
    [J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [7] Hardware Accelerator for Analytics of Sparse Data
    Nurvitadhi, Eriko
    Mishra, Asit
    Wang, Yu
    Venkatesh, Ganesh
    Marr, Debbie
    [J]. PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 1616 - 1621
  • [8] Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates
    Li, Jun
    Wang, Wei
    Li, Wu-Jun
    [J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
  • [9] Acceleration of DNN Training Regularization: Dropout Accelerator
    Lee, Gunhee
    Park, Hanmin
    Ryu, Soojung
    Lee, Hyuk-Jae
    [J]. 2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
  • [10] Efficient Memory Organization for DNN Hardware Accelerator Implementation on PSoC
    Rios-Navarro, Antonio
    Gutierrez-Galan, Daniel
    Dominguez-Morales, Juan Pedro
    Pinero-Fuentes, Enrique
    Duran-Lopez, Lourdes
    Tapiador-Morales, Ricardo
    Dominguez-Morales, Manuel Jesus
    [J]. ELECTRONICS, 2021, 10 (01) : 1 - 10