SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training

被引:239
|
作者
Qin, Eric [1 ]
Samajdar, Ananda [1 ]
Kwon, Hyoukjun [1 ]
Nadella, Vineet [1 ]
Srinivasan, Sudarshan [2 ]
Das, Dipankar [2 ]
Kaul, Bharat [2 ]
Krishna, Tushar [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Intel, Santa Clara, CA USA
关键词
D O I
10.1109/HPCA47549.2020.00015
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The advent of Deep Learning (DL) has radically transformed the computing industry across the entire spectrum from algorithms to circuits. As myriad application domains embrace DL, it has become synonymous with a genre of workloads across vision, speech, language, recommendations, robotics, and games. The key compute kernel within most DL workloads is general matrix-matrix multiplications (GEMMs), which appears frequently during both the forward pass (inference and training) and backward pass (training). GEMMs are a natural choice for hardware acceleration to speed up training, and have led to 2D systolic architectures like NVIDIA tensor cores and Google Tensor Processing Unit (TPU). Unfortunately, emerging GEMMs in DL are highly irregular and sparse, which lead to poor data mappings on systolic architectures. This paper proposes SIGMA, a flexible and scalable architecture that offers high utilization of all its processing elements (PEs) regardless of kernel shape and sparsity. Within SIGMA includes a novel reduction tree microarchitecture named Forwarding Adder Network (FAN). SIGMA performs 5.7 x better than systolic array architectures for irregular sparse matrices, and roughly 3x better than state-of-the-art sparse accelerators. We demonstrate an instance of SIGMA operating at 10.8 TFLOPS efficiency across arbitrary levels of sparsity, with a 65.10 mm(2) and 22.33 W footprint on a 28 nm process.
引用
收藏
页码:58 / 70
页数:13
相关论文
共 49 条
  • [1] Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial
    Mao, Wendong
    Wang, Meiqi
    Xie, Xiaoru
    Wu, Xiao
    Wang, Zhongfeng
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (03) : 1708 - 1714
  • [2] SparGD: A Sparse GEMM Accelerator with Dynamic Dataflow
    Wang, Bo
    Ma, Sheng
    Luo, Shengbai
    Wu, Lizhou
    Zhang, Jianmin
    Zhang, Chunyuan
    Li, Tiejun
    [J]. ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2024, 29 (02)
  • [3] MSCA: A Multi-grained Sparse Convolution Accelerator for DNN Training
    Mao, Yingchang
    Liu, Qiang
    Cheung, Ray C. C.
    [J]. 2024 IEEE 35TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP 2024, 2024, : 34 - 35
  • [4] Sparse Persistent GEMM Accelerator using OpenCL for Intel FPGAs
    Colangelo, Philip
    Sengupta, Shayan
    Margala, Martin
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [5] A Reconfigurable DNN Training Accelerator on FPGA
    Lu, Jinming
    Lin, Jun
    Wang, Zhongfeng
    [J]. 2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 94 - 99
  • [6] FullSparse: A Sparse-Aware GEMM Accelerator with Online Sparsity Prediction
    Yu, Jiangnan
    Fan, Yang
    Wang, Hanfei
    Qiao, Yuxuan
    Wu, Zheng
    Xiong, Xiankui
    Yao, Xiao
    Yao, Haidong
    Zhang, Yecheng
    [J]. PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2024, CF 2024, 2024, : 298 - 301
  • [7] Acceleration of DNN Training Regularization: Dropout Accelerator
    Lee, Gunhee
    Park, Hanmin
    Ryu, Soojung
    Lee, Hyuk-Jae
    [J]. 2020 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2020,
  • [8] FlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support
    Noh, Seock-Hwan
    Koo, Jahyun
    Lee, Seunghyun
    Park, Jongse
    Kung, Jaeha
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (09) : 2522 - 2535
  • [9] A Flexible DNN Accelerator Design with Layer Pipeline for FPGAs
    You, Weijie
    Chen, Deming
    Wu, Chang
    [J]. 2019 6TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2019), 2019, : 959 - 962
  • [10] An Efficient Sparse-Aware Summation Optimization Strategy for DNN Accelerator
    Zhang, Danqing
    Li, Baiting
    Wang, Hang
    Zhang, Xuchong
    Sun, Hongbin
    [J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,