Accelerating matrix-centric graph processing on GPUs through bit-level optimizations

被引：0

作者：

Chen, Jou-An ^{[1
]}

Sung, Hsin-Hsuan ^{[1
]}

Shen, Xipeng ^{[1
]}

Tallent, Nathan ^{[2
]}

Barker, Kevin ^{[2
]}

Li, Ang ^{[2
]}

机构：

[1] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA

[2] Pacific Northwest Natl Lab, Richland, WA USA

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2023年 / 177卷

基金：

美国国家科学基金会;

关键词：

GraphBLAS; Bit manipulation; GPU; Sparse matrix; Deep reinforcement learning;

D O I：

10.1016/j.jpdc.2023.02.013

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Even though it is well known that binary values are common in graph applications (e.g., adjacency matrix), how to leverage the phenomenon for efficiency has not yet been adequately explored. This paper presents a systematic study on how to unlock the potential of the bit-level optimizations of graph computations that involve binary values. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by the intrinsics of modern GPUs. It additionally introduces Deep Reinforcement Learning (DRL) as an efficient way to best configure the bit-level optimizations on the fly. The DQN-based adaptive tile size selector with dedicated model training can reach 68% prediction accuracy. Evaluations on the NVIDIA Pascal and Volta GPUs show that the optimizations bring up to 40x and 6555x for essential GraphBLAS kernels SpMV and SpGEMM, respectively, accelerating GraphBLAS-based BFS by up to 433x, SSSP, PR, and CC 35x, and TC 52x. (c) 2023 Elsevier Inc. All rights reserved.

引用

页码：53 / 67

页数：15

共 46 条

[31] Fingerprinting Protocol at Bit-level Granularity: A Graph-based Approach using Cell Embedding
Sang, Yafei
Zhang, Yongzheng
2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 266 - 275
[32] Rapid Single-Flux-Quantum Truncated Multiplier Based on Bit-Level Processing
Kito, Nobutaka
Odaka, Ryota
Takagi, Kazuyoshi
IEICE TRANSACTIONS ON ELECTRONICS, 2019, E102C (07) : 607 - 611
[33] BitLight: Turning DLP Projections into an Interactive Surface through Bit-level Light Encoding
Liu, Song
He, Tian
PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2020, 4 (04):
[34] Build a compact binary neural network through bit-level sensitivity and data pruning
Li, Yixing
Zhang, Shuai
Zhou, Xichuan
Ren, Fengbo
NEUROCOMPUTING, 2020, 398 : 45 - 54
[35] PARALLEL BIT-LEVEL PIPELINED VLSI DESIGNS FOR HIGH-SPEED SIGNAL-PROCESSING
HATAMIAN, M
CASH, GL
PROCEEDINGS OF THE IEEE, 1987, 75 (09) : 1192 - 1202
[36] Bit-width optimizations for high-level synthesis of digital signal processing systems
Andriamisaina, Caalipb
Le Gal, Bertrand
Casseau, Emmanuel
2006 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS DESIGN AND IMPLEMENTATION, 2006, : 280 - 285
[37] Secure image block compressive sensing using complex Hadamard measurement matrix and bit-level XOR
Xue, Linlin
Wang, Yue
Wang, Zhongpeng
IET INFORMATION SECURITY, 2022, 16 (06) : 417 - 431
[38] Bit-Level Pipelined 2-D Digital Filters for Real-Time Image Processing
Wu, Cheng-Wen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1991, 1 (01) : 22 - 34
[39] High throughput VLSI implementation of discrete orthogonal transforms using bit-level vector-matrix multiplier
Nayak, SS
Meher, PK
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1999, 46 (05): : 655 - 658
[40] Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access
Zhang, Yongzhe
Ko, Hsiang-Shang
Hu, Zhenjiang
PROGRAMMING LANGUAGES AND SYSTEMS (APLAS 2017), 2017, 10695 : 301 - 320

← 1 2 3 4 5 →