Accelerating matrix-centric graph processing on GPUs through bit-level optimizations

被引：0

作者：

Chen, Jou-An ^{[1
]}

Sung, Hsin-Hsuan ^{[1
]}

Shen, Xipeng ^{[1
]}

Tallent, Nathan ^{[2
]}

Barker, Kevin ^{[2
]}

Li, Ang ^{[2
]}

机构：

[1] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA

[2] Pacific Northwest Natl Lab, Richland, WA USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2023年 / 177卷

基金：

美国国家科学基金会;

关键词：

GraphBLAS; Bit manipulation; GPU; Sparse matrix; Deep reinforcement learning;

D O I：

10.1016/j.jpdc.2023.02.013

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Even though it is well known that binary values are common in graph applications (e.g., adjacency matrix), how to leverage the phenomenon for efficiency has not yet been adequately explored. This paper presents a systematic study on how to unlock the potential of the bit-level optimizations of graph computations that involve binary values. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by the intrinsics of modern GPUs. It additionally introduces Deep Reinforcement Learning (DRL) as an efficient way to best configure the bit-level optimizations on the fly. The DQN-based adaptive tile size selector with dedicated model training can reach 68% prediction accuracy. Evaluations on the NVIDIA Pascal and Volta GPUs show that the optimizations bring up to 40x and 6555x for essential GraphBLAS kernels SpMV and SpGEMM, respectively, accelerating GraphBLAS-based BFS by up to 433x, SSSP, PR, and CC 35x, and TC 52x. (c) 2023 Elsevier Inc. All rights reserved.

引用

页码：53 / 67

页数：15

共 46 条

[1] Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU
Chen, Jou-An
Sung, Hsin-Hsuan
Shen, Xipeng
Tallent, Nathan
Barker, Kevin
Li, Ang
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 515 - 525
[2] Accelerating Matrix Processing with GPUs
Malaya, Nicholas
Che, Shuai
Greathouse, Joseph L.
van Oostrum, Rene
Schulte, Michael J.
2017 IEEE 24TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2017, : 139 - 141
[3] GraphPEG: Accelerating Graph Processing on GPUs
Lu, Yashuai
Guo, Hui
Huang, Libo
Yu, Qi
Shen, Li
Xiao, Nong
Wang, Zhiying
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (03)
[4] Accelerating Unstructured Graph Data Processing on GPUs
Pan, Xiaohui
2ND INTERNATIONAL CONFERENCE ON SIMULATION AND MODELING METHODOLOGIES, TECHNOLOGIES AND APPLICATIONS (SMTA 2015), 2015, : 29 - 33
[5] An improved architecture for bit-level matrix multiplication
Grover, RS
Shang, WJ
Li, Q
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 2257 - 2264
[6] Reduce, Reuse, and Adapt: Accelerating Graph Processing on GPUs
Ullas, A.
Nasre, Rupesh
Govindarajan, R.
2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 335 - 346
[7] DESIGN OF BIT-LEVEL SYSTOLIC ARRAYS WITH DEPENDENCE GRAPH
LIU, CM
JEN, CW
SYSTOLIC ARRAY PROCESSORS, 1989, : 439 - 448
[8] Accelerating Complex Event Processing through GPUs
Rodrigo, Prabodha Srimal
Bandara, H. M. N. Dilum
Perera, Srinath
2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 325 - 334
[9] Bit-Beading: Stringing bit-level MAC results for Accelerating Neural Networks
Anwar, Zeeshan
Longchar, Imlijungla
Kapoor, Hemangee K.
PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 216 - 221
[10] Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
Sharma, Hardik
Park, Jongse
Suda, Naveen
Lai, Liangzhen
Chau, Benson
Chandra, Vikas
Esmaeilzadeh, Hadi
2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 764 - 775

← 1 2 3 4 5 →