Accelerating Large Sparse Neural Network Inference Using GPU Task Graph Parallelism

被引：13

作者：

Lin, Dian-Lun ^{[1
]}

Huang, Tsung-Wei ^{[1
]}

机构：

[1] Univ Utah, Dept Elect & Comp Engn, Salt Lake City, UT 84112 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 11期

关键词：

Graphics processing units; Kernel; Task analysis; Parallel processing; Programming; Neurons; Data models; Task graph parallelism;

D O I：

10.1109/TPDS.2021.3138856

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The ever-increasing size of modern deep neural network (DNN) architectures has put increasing strain on the hardware needed to implement them. Sparsified DNNs can greatly reduce memory costs and increase throughput over standard DNNs, if the loss of accuracy can be adequately controlled. However, sparse DNNs present unique computational challenges. Efficient model or data parallelism algorithms are extremely hard to design and implement. The recent effort MIT/IEEE/Amazon HPEC Graph Challenge has drawn attention to high-performance inference methods for large sparse DNNs. In this article, we introduce SNIG, an efficient inference engine for large sparse DNNs. SNIG develops highly optimized inference kernels and leverages the power of CUDA Graphs to enable efficient decomposition of model and data parallelisms. Our decomposition strategy is flexible and scalable to different partitions of data volumes, model sizes, and GPU numbers. We have evaluated SNIG on the official benchmarks of HPEC Sparse DNN Challenge and demonstrated its promising performance scalable from a single GPU to multiple GPUs. Compared to the champion of the 2019 HPEC Sparse DNN Challenge, SNIG can finish all inference workloads using only a single GPU. At the largest DNN, which has more than 4 billion parameters across 1920 layers each of 65536 neurons, SNIG is up to 2.3x faster than a state-of-the-art baseline under a machine of 4 GPUs. SNIG receives the Champion Award in 2020 HPEC Sparse DNN Challenge.

引用

页码：3041 / 3052

页数：12

共 50 条

[1] A Novel Inference Algorithm for Large Sparse Neural Network using Task Graph Parallelism
Lin, Dian-Lun
Huang, Tsung-Wei
2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[2] Accelerating Sparse Deep Neural Network Inference Using GPU Tensor Cores
Sun, Yufei
Zheng, Long
Wang, Qinggang
Ye, Xiangyu
Huang, Yu
Yao, Pengcheng
Liao, Xiaofei
Jin, Hai
2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC), 2022,
[3] SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU
Jiang, Shui
Huang, Tsung-Wei
Yu, Bei
Ho, Tsung-Yi
PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 51 - 61
[4] Efficient GPU Computation Using Task Graph Parallelism
Lin, Dian-Lun
Huang, Tsung-Wei
EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 435 - 450
[5] Accelerating Graph Neural Networks using GPU
Nayak, Niharika
Jatala, Vishwesh
2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA AND ANALYTICS WORKSHOP, HIPCW, 2022, : 73 - 73
[6] Accelerating large graph algorithms on the GPU using CUDA
Harish, Pawan
Narayanan, P. J.
HIGH PERFORMANCE COMPUTING - HIPC 2007, PROCEEDINGS, 2007, 4873 : 197 - 208
[7] A GPU Implementation of the Sparse Deep Neural Network Graph Challenge
Bisson, Mauro
Fatica, Massimiliano
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[8] GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference
Dang, Haoran
Wu, Meng
Yan, Mingyu
Ye, Xiaochun
Fan, Dongrui
EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 346 - 361
[9] Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU
Chen, Hanqiu
Alhinai, Yahya
Jiang, Yihan
Na, Eunjee
Hao, Cong
2022 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2022), 2022, : 130 - 145
[10] Data Parallel Large Sparse Deep Neural Network on GPU
Sattar, Naw Safrin
Arifuzzaman, Shaikh
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 1006 - 1014

← 1 2 3 4 5 →