Accelerating Large Sparse Neural Network Inference Using GPU Task Graph Parallelism

被引:13
|
作者
Lin, Dian-Lun [1 ]
Huang, Tsung-Wei [1 ]
机构
[1] Univ Utah, Dept Elect & Comp Engn, Salt Lake City, UT 84112 USA
关键词
Graphics processing units; Kernel; Task analysis; Parallel processing; Programming; Neurons; Data models; Task graph parallelism;
D O I
10.1109/TPDS.2021.3138856
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The ever-increasing size of modern deep neural network (DNN) architectures has put increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly reduce memory costs and increase throughput over standard DNNs, if the loss of accuracy can be adequately controlled. However, sparse DNNs present unique computational challenges. Efficient model or data parallelism algorithms are extremely hard to design and implement. The recent effort MIT/IEEE/Amazon HPEC Graph Challenge has drawn attention to high-performance inference methods for large sparse DNNs. In this article, we introduce SNIG, an efficient inference engine for large sparse DNNs. SNIG develops highly optimized inference kernels and leverages the power of CUDA Graphs to enable efficient decomposition of model and data parallelisms. Our decomposition strategy is flexible and scalable to different partitions of data volumes, model sizes, and GPU numbers. We have evaluated SNIG on the official benchmarks of HPEC Sparse DNN Challenge and demonstrated its promising performance scalable from a single GPU to multiple GPUs. Compared to the champion of the 2019 HPEC Sparse DNN Challenge, SNIG can finish all inference workloads using only a single GPU. At the largest DNN, which has more than 4 billion parameters across 1920 layers each of 65536 neurons, SNIG is up to 2.3x faster than a state-of-the-art baseline under a machine of 4 GPUs. SNIG receives the Champion Award in 2020 HPEC Sparse DNN Challenge.
引用
收藏
页码:3041 / 3052
页数:12
相关论文
共 50 条
  • [1] A Novel Inference Algorithm for Large Sparse Neural Network using Task Graph Parallelism
    Lin, Dian-Lun
    Huang, Tsung-Wei
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [2] Accelerating Sparse Deep Neural Network Inference Using GPU Tensor Cores
    Sun, Yufei
    Zheng, Long
    Wang, Qinggang
    Ye, Xiangyu
    Huang, Yu
    Yao, Pengcheng
    Liao, Xiaofei
    Jin, Hai
    2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC), 2022,
  • [3] SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU
    Jiang, Shui
    Huang, Tsung-Wei
    Yu, Bei
    Ho, Tsung-Yi
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 51 - 61
  • [4] Efficient GPU Computation Using Task Graph Parallelism
    Lin, Dian-Lun
    Huang, Tsung-Wei
    EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 435 - 450
  • [5] Accelerating Graph Neural Networks using GPU
    Nayak, Niharika
    Jatala, Vishwesh
    2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA AND ANALYTICS WORKSHOP, HIPCW, 2022, : 73 - 73
  • [6] Accelerating large graph algorithms on the GPU using CUDA
    Harish, Pawan
    Narayanan, P. J.
    HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS, 2007, 4873 : 197 - 208
  • [7] A GPU Implementation of the Sparse Deep Neural Network Graph Challenge
    Bisson, Mauro
    Fatica, Massimiliano
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [8] GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference
    Dang, Haoran
    Wu, Meng
    Yan, Mingyu
    Ye, Xiaochun
    Fan, Dongrui
    EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 346 - 361
  • [9] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU
    Chen, Hanqiu
    Alhinai, Yahya
    Jiang, Yihan
    Na, Eunjee
    Hao, Cong
    2022 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2022), 2022, : 130 - 145
  • [10] Data Parallel Large Sparse Deep Neural Network on GPU
    Sattar, Naw Safrin
    Arifuzzaman, Shaikh
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 1006 - 1014