Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引：0

作者：

Zhao, Zhixiang ^{[1
]}

Wu, Yanxia ^{[1
]}

Zhang, Guoyin ^{[1
]}

Yang, Yiqing ^{[1
]}

Hong, Ruize ^{[1
]}

机构：

[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2025年 / 37卷 / 4-5期

关键词：

GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;

D O I：

10.1002/cpe.8366

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.

引用

页数：13

共 50 条

[41] Sparse matrix-vector multiplication on network-on-chip
Sun, C-C
Goetze, J.
Jheng, H-Y
Ruan, S-J
ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
[42] Optimization by Runtime Specialization for Sparse Matrix-Vector Multiplication
Kamin, Sam
Garzaran, Maria Jesus
Aktemur, Baris
Xu, Danqing
Yilmaz, Buse
Chen, Zhongbo
ACM SIGPLAN NOTICES, 2015, 50 (03) : 93 - 102
[43] A new approach for accelerating the sparse matrix-vector multiplication
Tvrdik, Pavel
Simecek, Ivan
SYNASC 2006: EIGHTH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, PROCEEDINGS, 2007, : 156 - +
[44] No Zero Padded Sparse Matrix-Vector Multiplication on FPGAs
Huang, Jiasen
Ren, Junyan
Yin, Wenbo
Wang, Lingli
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2014, : 290 - 291
[45] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
Dubois, David
Dubois, Andrew
Boorman, Thomas
Connor, Carolyn
Poole, Steve
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (01)
[46] Sparse Binary Matrix-Vector Multiplication on Neuromorphic Computers
Schuman, Catherine D.
Kay, Bill
Date, Prasanna
Kannan, Ramakrishnan
Sao, Piyush
Potok, Thomas E.
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 308 - 311
[47] Optimization techniques for sparse matrix-vector multiplication on GPUs
Maggioni, Marco
Berger-Wolf, Tanya
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
[48] Parallel Sparse Matrix-Vector Multiplication Using Accelerators
Maeda, Hiroshi
Takahashi, Daisuke
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2016, PT II, 2016, 9787 : 3 - 18
[49] A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU
Tang, Wai Teng
Tan, Wen Jun
Goh, Rick Siow Mong
Turner, Stephen John
Wong, Weng-Fai
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) : 2373 - 2385
[50] CSR&RV: An Efficient Value Compression Format for Sparse Matrix-Vector Multiplication
Yan, Junjun
Chen, Xinhai
Liu, Jie
NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 54 - 60

← 1 2 3 4 5 →