Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引：0

作者：

Zhao, Zhixiang ^{[1
]}

Wu, Yanxia ^{[1
]}

Zhang, Guoyin ^{[1
]}

Yang, Yiqing ^{[1
]}

Hong, Ruize ^{[1
]}

机构：

[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2025年 / 37卷 / 4-5期

关键词：

GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;

D O I：

10.1002/cpe.8366

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.

引用

页数：13

共 50 条

[31] A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication
Gao, Jiaquan
Zhou, Yuanshen
Wu, Kesong
PARALLEL PROCESSING LETTERS, 2016, 26 (04)
[32] Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format
Martone, Michele
PARALLEL COMPUTING, 2014, 40 (07) : 251 - 270
[33] High-Performance Matrix-Vector Multiplication on the GPU
Sorensen, Hans Henrik Brandenborg
EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 377 - 386
[34] Adaptive sparse matrix representation for efficient matrix-vector multiplication
Zardoshti, Pantea
Khunjush, Farshad
Sarbazi-Azad, Hamid
JOURNAL OF SUPERCOMPUTING, 2016, 72 (09): : 3366 - 3386
[35] A GPU Framework for Sparse Matrix Vector Multiplication
Neelima, B.
Reddy, G. Ram Mohana
Raghavendra, Prakash S.
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
[36] Load-balancing in sparse matrix-vector multiplication
Nastea, SG
Frieder, O
ElGhazawi, T
EIGHTH IEEE SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1996, : 218 - 225
[37] Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
Yilmaz, Buse
Aktemur, Baris
Garzaran, Maria J.
Kamin, Sam
Kirac, Furkan
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (01)
[38] Sparse Matrix-Vector Multiplication Based on Online Arithmetic
Cherati, Sahar Moradi
Jaberipur, Ghassem
Sousa, Leonel
IEEE ACCESS, 2024, 12 : 87653 - 87664
[39] Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems
Gao, Jianhua
Ji, Weixing
Wang, Yizhuo
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (04)
[40] Communication balancing in parallel sparse matrix-vector multiplication
Bisseling, RH
Meesen, W
ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65

← 1 2 3 4 5 →