Accelerating matrix-centric graph processing on GPUs through bit-level optimizations

被引:0
|
作者
Chen, Jou-An [1 ]
Sung, Hsin-Hsuan [1 ]
Shen, Xipeng [1 ]
Tallent, Nathan [2 ]
Barker, Kevin [2 ]
Li, Ang [2 ]
机构
[1] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
[2] Pacific Northwest Natl Lab, Richland, WA USA
基金
美国国家科学基金会;
关键词
GraphBLAS; Bit manipulation; GPU; Sparse matrix; Deep reinforcement learning;
D O I
10.1016/j.jpdc.2023.02.013
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Even though it is well known that binary values are common in graph applications (e.g., adjacency matrix), how to leverage the phenomenon for efficiency has not yet been adequately explored. This paper presents a systematic study on how to unlock the potential of the bit-level optimizations of graph computations that involve binary values. It proposes a two-level representation named Bit-Block Compressed Sparse Row (B2SR) and presents a series of optimizations to the graph operations on B2SR by the intrinsics of modern GPUs. It additionally introduces Deep Reinforcement Learning (DRL) as an efficient way to best configure the bit-level optimizations on the fly. The DQN-based adaptive tile size selector with dedicated model training can reach 68% prediction accuracy. Evaluations on the NVIDIA Pascal and Volta GPUs show that the optimizations bring up to 40x and 6555x for essential GraphBLAS kernels SpMV and SpGEMM, respectively, accelerating GraphBLAS-based BFS by up to 433x, SSSP, PR, and CC 35x, and TC 52x. (c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页码:53 / 67
页数:15
相关论文
共 46 条
  • [31] Fingerprinting Protocol at Bit-level Granularity: A Graph-based Approach using Cell Embedding
    Sang, Yafei
    Zhang, Yongzheng
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 266 - 275
  • [32] Rapid Single-Flux-Quantum Truncated Multiplier Based on Bit-Level Processing
    Kito, Nobutaka
    Odaka, Ryota
    Takagi, Kazuyoshi
    IEICE TRANSACTIONS ON ELECTRONICS, 2019, E102C (07) : 607 - 611
  • [33] BitLight: Turning DLP Projections into an Interactive Surface through Bit-level Light Encoding
    Liu, Song
    He, Tian
    PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2020, 4 (04):
  • [34] Build a compact binary neural network through bit-level sensitivity and data pruning
    Li, Yixing
    Zhang, Shuai
    Zhou, Xichuan
    Ren, Fengbo
    NEUROCOMPUTING, 2020, 398 : 45 - 54
  • [35] PARALLEL BIT-LEVEL PIPELINED VLSI DESIGNS FOR HIGH-SPEED SIGNAL-PROCESSING
    HATAMIAN, M
    CASH, GL
    PROCEEDINGS OF THE IEEE, 1987, 75 (09) : 1192 - 1202
  • [36] Bit-width optimizations for high-level synthesis of digital signal processing systems
    Andriamisaina, Caalipb
    Le Gal, Bertrand
    Casseau, Emmanuel
    2006 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS DESIGN AND IMPLEMENTATION, 2006, : 280 - 285
  • [37] Secure image block compressive sensing using complex Hadamard measurement matrix and bit-level XOR
    Xue, Linlin
    Wang, Yue
    Wang, Zhongpeng
    IET INFORMATION SECURITY, 2022, 16 (06) : 417 - 431
  • [38] Bit-Level Pipelined 2-D Digital Filters for Real-Time Image Processing
    Wu, Cheng-Wen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1991, 1 (01) : 22 - 34
  • [39] High throughput VLSI implementation of discrete orthogonal transforms using bit-level vector-matrix multiplier
    Nayak, SS
    Meher, PK
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1999, 46 (05): : 655 - 658
  • [40] Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access
    Zhang, Yongzhe
    Ko, Hsiang-Shang
    Hu, Zhenjiang
    PROGRAMMING LANGUAGES AND SYSTEMS (APLAS 2017), 2017, 10695 : 301 - 320