Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引:0
|
作者
Zhao, Zhixiang [1 ]
Wu, Yanxia [1 ]
Zhang, Guoyin [1 ]
Yang, Yiqing [1 ]
Hong, Ruize [1 ]
机构
[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China
来源
关键词
GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;
D O I
10.1002/cpe.8366
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication
    Gao, Jiaquan
    Zhou, Yuanshen
    Wu, Kesong
    PARALLEL PROCESSING LETTERS, 2016, 26 (04)
  • [32] Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format
    Martone, Michele
    PARALLEL COMPUTING, 2014, 40 (07) : 251 - 270
  • [33] High-Performance Matrix-Vector Multiplication on the GPU
    Sorensen, Hans Henrik Brandenborg
    EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 377 - 386
  • [34] Adaptive sparse matrix representation for efficient matrix-vector multiplication
    Zardoshti, Pantea
    Khunjush, Farshad
    Sarbazi-Azad, Hamid
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (09): : 3366 - 3386
  • [35] A GPU Framework for Sparse Matrix Vector Multiplication
    Neelima, B.
    Reddy, G. Ram Mohana
    Raghavendra, Prakash S.
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
  • [36] Load-balancing in sparse matrix-vector multiplication
    Nastea, SG
    Frieder, O
    ElGhazawi, T
    EIGHTH IEEE SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1996, : 218 - 225
  • [37] Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
    Yilmaz, Buse
    Aktemur, Baris
    Garzaran, Maria J.
    Kamin, Sam
    Kirac, Furkan
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (01)
  • [38] Sparse Matrix-Vector Multiplication Based on Online Arithmetic
    Cherati, Sahar Moradi
    Jaberipur, Ghassem
    Sousa, Leonel
    IEEE ACCESS, 2024, 12 : 87653 - 87664
  • [39] Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems
    Gao, Jianhua
    Ji, Weixing
    Wang, Yizhuo
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (04)
  • [40] Communication balancing in parallel sparse matrix-vector multiplication
    Bisseling, RH
    Meesen, W
    ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65