Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU

被引:0
|
作者
Zhao, Zhixiang [1 ]
Wu, Yanxia [1 ]
Zhang, Guoyin [1 ]
Yang, Yiqing [1 ]
Hong, Ruize [1 ]
机构
[1] Harbin Engn Univ, Dept Comp Sci, Harbin, Peoples R China
来源
关键词
GPU; memory bandwidth; sparse matrices; SpMV; OPTIMIZATION; FORMAT; SPMV; SIMD;
D O I
10.1002/cpe.8366
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in scientific computing, machine learning, and data analysis. The performance of SpMV on GPUs is crucial for accelerating various applications. However, the efficiency of SpMV on GPUs is significantly affected by irregular memory access patterns, high memory bandwidth requirements, and insufficient exploitation of parallelism. In this paper, we propose a Recursive Hybrid Compression (RHC) method to address these challenges. RHC begins by splitting the initial matrix into two portions: an Ellpack (ELL) portion and a Coordinate (COO) portion. This partitioning is followed by further recursive division of the COO portion into additional ELL and COO portions, continuing this process until predefined termination criteria, based on a percentage threshold of the number of nonzero elements, are met. Additionally, we introduce a dynamic partitioning method to determine the optimal threshold for partitioning the matrix into ELL and COO portions based on the distribution of nonzero elements and the memory footprint. We develop the RHC algorithm to fully exploit the advantages of the ELL kernel on GPUs and achieve high thread-level parallelism. We evaluated our proposed method on two different NVIDIA GPUs: the GeForce RTX 2080 Ti and the A100, using a set of sparse matrices from the SuiteSparse Matrix Collection. We compare RHC with NVIDIA's cuSPARSE library and three state-of-the-art methods: SELLP, MergeBase, and BalanceCSR. RHC achieves average speedups of 2.13x$$ \times $$, 1.13x$$ \times $$, 1.87x$$ \times $$, and 1.27x$$ \times $$ over cuSPARSE, SELLP, MergeBase, and BalanceCSR, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Sparse matrix-vector multiplication on network-on-chip
    Sun, C-C
    Goetze, J.
    Jheng, H-Y
    Ruan, S-J
    ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
  • [42] Optimization by Runtime Specialization for Sparse Matrix-Vector Multiplication
    Kamin, Sam
    Garzaran, Maria Jesus
    Aktemur, Baris
    Xu, Danqing
    Yilmaz, Buse
    Chen, Zhongbo
    ACM SIGPLAN NOTICES, 2015, 50 (03) : 93 - 102
  • [43] A new approach for accelerating the sparse matrix-vector multiplication
    Tvrdik, Pavel
    Simecek, Ivan
    SYNASC 2006: EIGHTH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, PROCEEDINGS, 2007, : 156 - +
  • [44] No Zero Padded Sparse Matrix-Vector Multiplication on FPGAs
    Huang, Jiasen
    Ren, Junyan
    Yin, Wenbo
    Wang, Lingli
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2014, : 290 - 291
  • [45] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
    Dubois, David
    Dubois, Andrew
    Boorman, Thomas
    Connor, Carolyn
    Poole, Steve
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (01)
  • [46] Sparse Binary Matrix-Vector Multiplication on Neuromorphic Computers
    Schuman, Catherine D.
    Kay, Bill
    Date, Prasanna
    Kannan, Ramakrishnan
    Sao, Piyush
    Potok, Thomas E.
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 308 - 311
  • [47] Optimization techniques for sparse matrix-vector multiplication on GPUs
    Maggioni, Marco
    Berger-Wolf, Tanya
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
  • [48] Parallel Sparse Matrix-Vector Multiplication Using Accelerators
    Maeda, Hiroshi
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2016, PT II, 2016, 9787 : 3 - 18
  • [49] A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU
    Tang, Wai Teng
    Tan, Wen Jun
    Goh, Rick Siow Mong
    Turner, Stephen John
    Wong, Weng-Fai
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (09) : 2373 - 2385
  • [50] CSR&RV: An Efficient Value Compression Format for Sparse Matrix-Vector Multiplication
    Yan, Junjun
    Chen, Xinhai
    Liu, Jie
    NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 54 - 60