Accelerating Large Sparse Neural Network Inference Using GPU Task Graph Parallelism

被引：13

作者：

Lin, Dian-Lun ^{[1
]}

Huang, Tsung-Wei ^{[1
]}

机构：

[1] Univ Utah, Dept Elect & Comp Engn, Salt Lake City, UT 84112 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 11期

关键词：

Graphics processing units; Kernel; Task analysis; Parallel processing; Programming; Neurons; Data models; Task graph parallelism;

D O I：

10.1109/TPDS.2021.3138856

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The ever-increasing size of modern deep neural network (DNN) architectures has put increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly reduce memory costs and increase throughput over standard DNNs, if the loss of accuracy can be adequately controlled. However, sparse DNNs present unique computational challenges. Efficient model or data parallelism algorithms are extremely hard to design and implement. The recent effort MIT/IEEE/Amazon HPEC Graph Challenge has drawn attention to high-performance inference methods for large sparse DNNs. In this article, we introduce SNIG, an efficient inference engine for large sparse DNNs. SNIG develops highly optimized inference kernels and leverages the power of CUDA Graphs to enable efficient decomposition of model and data parallelisms. Our decomposition strategy is flexible and scalable to different partitions of data volumes, model sizes, and GPU numbers. We have evaluated SNIG on the official benchmarks of HPEC Sparse DNN Challenge and demonstrated its promising performance scalable from a single GPU to multiple GPUs. Compared to the champion of the 2019 HPEC Sparse DNN Challenge, SNIG can finish all inference workloads using only a single GPU. At the largest DNN, which has more than 4 billion parameters across 1920 layers each of 65536 neurons, SNIG is up to 2.3x faster than a state-of-the-art baseline under a machine of 4 GPUs. SNIG receives the Champion Award in 2020 HPEC Sparse DNN Challenge.

引用

页码：3041 / 3052

页数：12

共 50 条

[31] Accelerating Neural Network Training with Processing-in-Memory GPU
Fei, Xiang
Han, Jianhui
Huang, Jianqiang
Zheng, Weimin
Zhang, Youhui
2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 414 - 421
[32] Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks
Fant, Ruibo
Wang, Wei
Chu, Xiaowen
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 501 - 511
[33] GPU Occupancy Prediction of Deep Learning Models Using Graph Neural Network
Mei, Hengquan
Qu, Huaizhi
Sun, Jingwei
Gao, Yanjie
Lin, Haoxiang
Sun, Guangzhong
2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 318 - 329
[34] Sparse Deep Neural Network Graph Challenge
Kepner, Jeremy
Alford, Simon
Gadepally, Vijay
Jones, Michael
Milechin, Lauren
Robinett, Ryan
Samsi, Sid
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[35] Accelerating Maximal Bicliques Enumeration with GPU on large scale network
Wu, Chunqi
Li, Jingdong
Li, Zhao
Zhang, Ji
Tang, Pan
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 161 : 601 - 613
[36] Towards parallelism detection of sequential programs with graph neural network
Shen, Yuanyuan
Peng, Manman
Wang, Shiling
Wu, Qiang
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 125 : 515 - 525
[37] QGTC: Accelerating Quantized Graph Neural Networks via GPU Tensor Core
Wang, Yuke
Feng, Boyuan
Ding, Yufei
PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 107 - 119
[38] Large-scale Memory of Sequences using Binary Sparse Neural Networks on GPU
Marques, Max Raphael Sobroza
Hacene, Ghouthi Boukli
Lassance, Carlos Eduardo Rosar Kos
Horrein, Pierre-Henri
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 553 - 559
[39] Accelerating Virtual Network Embedding with Graph Neural Networks
Habibi, Farzad
Dolati, Mahdi
Khonsari, Ahmad
Ghaderi, Majid
2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
[40] POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters
Li, Shunde
Gu, Junyu
Wang, Jue
Yao, Tiechui
Liang, Zhiqiang
Shi, Yumeng
Li, Shigang
Xi, Weiting
Li, Shushen
Zhou, Chunbao
Wang, Yangang
Chi, Xuebin
PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 469 - 471

← 1 2 3 4 5 →