Accelerating Large Sparse Neural Network Inference Using GPU Task Graph Parallelism

被引:13
|
作者
Lin, Dian-Lun [1 ]
Huang, Tsung-Wei [1 ]
机构
[1] Univ Utah, Dept Elect & Comp Engn, Salt Lake City, UT 84112 USA
关键词
Graphics processing units; Kernel; Task analysis; Parallel processing; Programming; Neurons; Data models; Task graph parallelism;
D O I
10.1109/TPDS.2021.3138856
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The ever-increasing size of modern deep neural network (DNN) architectures has put increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly reduce memory costs and increase throughput over standard DNNs, if the loss of accuracy can be adequately controlled. However, sparse DNNs present unique computational challenges. Efficient model or data parallelism algorithms are extremely hard to design and implement. The recent effort MIT/IEEE/Amazon HPEC Graph Challenge has drawn attention to high-performance inference methods for large sparse DNNs. In this article, we introduce SNIG, an efficient inference engine for large sparse DNNs. SNIG develops highly optimized inference kernels and leverages the power of CUDA Graphs to enable efficient decomposition of model and data parallelisms. Our decomposition strategy is flexible and scalable to different partitions of data volumes, model sizes, and GPU numbers. We have evaluated SNIG on the official benchmarks of HPEC Sparse DNN Challenge and demonstrated its promising performance scalable from a single GPU to multiple GPUs. Compared to the champion of the 2019 HPEC Sparse DNN Challenge, SNIG can finish all inference workloads using only a single GPU. At the largest DNN, which has more than 4 billion parameters across 1920 layers each of 65536 neurons, SNIG is up to 2.3x faster than a state-of-the-art baseline under a machine of 4 GPUs. SNIG receives the Champion Award in 2020 HPEC Sparse DNN Challenge.
引用
收藏
页码:3041 / 3052
页数:12
相关论文
共 50 条
  • [31] Accelerating Neural Network Training with Processing-in-Memory GPU
    Fei, Xiang
    Han, Jianhui
    Huang, Jianqiang
    Zheng, Weimin
    Zhang, Youhui
    2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 414 - 421
  • [32] Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks
    Fant, Ruibo
    Wang, Wei
    Chu, Xiaowen
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 501 - 511
  • [33] GPU Occupancy Prediction of Deep Learning Models Using Graph Neural Network
    Mei, Hengquan
    Qu, Huaizhi
    Sun, Jingwei
    Gao, Yanjie
    Lin, Haoxiang
    Sun, Guangzhong
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 318 - 329
  • [34] Sparse Deep Neural Network Graph Challenge
    Kepner, Jeremy
    Alford, Simon
    Gadepally, Vijay
    Jones, Michael
    Milechin, Lauren
    Robinett, Ryan
    Samsi, Sid
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [35] Accelerating Maximal Bicliques Enumeration with GPU on large scale network
    Wu, Chunqi
    Li, Jingdong
    Li, Zhao
    Zhang, Ji
    Tang, Pan
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 161 : 601 - 613
  • [36] Towards parallelism detection of sequential programs with graph neural network
    Shen, Yuanyuan
    Peng, Manman
    Wang, Shiling
    Wu, Qiang
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 125 : 515 - 525
  • [37] QGTC: Accelerating Quantized Graph Neural Networks via GPU Tensor Core
    Wang, Yuke
    Feng, Boyuan
    Ding, Yufei
    PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 107 - 119
  • [38] Large-scale Memory of Sequences using Binary Sparse Neural Networks on GPU
    Marques, Max Raphael Sobroza
    Hacene, Ghouthi Boukli
    Lassance, Carlos Eduardo Rosar Kos
    Horrein, Pierre-Henri
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 553 - 559
  • [39] Accelerating Virtual Network Embedding with Graph Neural Networks
    Habibi, Farzad
    Dolati, Mahdi
    Khonsari, Ahmad
    Ghaderi, Majid
    2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [40] POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters
    Li, Shunde
    Gu, Junyu
    Wang, Jue
    Yao, Tiechui
    Liang, Zhiqiang
    Shi, Yumeng
    Li, Shigang
    Xi, Weiting
    Li, Shushen
    Zhou, Chunbao
    Wang, Yangang
    Chi, Xuebin
    PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 469 - 471