TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs

被引:0
|
作者
Ji, Haonan [1 ]
Song, Huimin [1 ]
Lu, Shibo [2 ]
Jin, Zhou [1 ]
Tan, Guangming [3 ]
Liu, Weifeng [1 ]
机构
[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China
[2] Northeastern Univ, Boston, MA USA
[3] Chinese Acad Sci, State Key Lab Comp Architecture, Inst Comp Technol, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Sparse matrix; SpMSpV; BFS; Tiling; GPU; GRAPH ALGORITHMS; LIBRARY;
D O I
10.1145/3545008.3545028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sparse matrix-sparse vector multiplication (SpMSpV) is an important primitive for graph algorithms and machine learning applications. The sparsity of the input and output vectors makes its floating point efficiency in general lower than sparse matrix-vector multiplication (SpMV) and sparse matrix-matrix multiplication (SpGEMM). Existing parallel SpMSpV methods focused on various row- and column-wise storage formats and merging operations. However, the data locality and sparsity pattern of the input matrix and vector are largely ignored. We in this paper propose TileSpMSpV, a tiled algorithm for accelerating SpMSpV on GPUs. Firstly, tile-wise storage structures are developed for fast positioning a group of nonzeros in matrix and vectors. Then, we develop the TileSpMSpV algorithm on top of the storage structures. In addition, to accelerate directional optimization breadth-first search (BFS) by using TileSpMSpV, we propose a TileBFS algorithm including three kernels called Push-CSC, Push-CSR and Pull-CSC. In the experiments running on a high-end NVIDIA GPU and using 2757 sparse matrices, the TileSpMSpV algorithm outperforms TileSpMV, cuSPARSE and CombBLAS by a factor of on average 1.83, 17.18 and 17.20 (up to 7.68, 1050.02 and 235.90), respectively. Moreover, our TileBFS algorithm outperforms Gunrock and GSwitch by a factor of on average 2.88 and 4.52 (up to 21.35 and 1000.85), respectively.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Dong, Meichen
    Jin, Zhou
    Liu, Weifeng
    Tan, Guangming
    [J]. 2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 68 - 78
  • [2] A work-efficient parallel sparse matrix-sparse vector multiplication algorithm
    Azad, Ariful
    Buluc, Aydin
    [J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 688 - 697
  • [3] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
    Niu, Yuyao
    Lu, Zhengyang
    Ji, Haonan
    Song, Shuhui
    Jin, Zhou
    Liu, Weifeng
    [J]. PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
  • [4] Merge-based Parallel Sparse Matrix-Sparse Vector Multiplication with a Vector Architecture
    Li, Haoran
    Yokoyama, Harumichi
    Araki, Takuya
    [J]. IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 43 - 50
  • [5] Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
    Yang, Carl
    Wang, Yangzihao
    Owens, John D.
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 841 - 847
  • [6] Optimization techniques for sparse matrix-vector multiplication on GPUs
    Maggioni, Marco
    Berger-Wolf, Tanya
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
  • [7] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
    Abu-Sufah, Walid
    Ahmad, Khalid
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124
  • [8] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [9] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
    Nurudin Alvarez, Francisco
    Antonio Ortega-Toro, Jose
    Ujaldon, Manuel
    [J]. HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
  • [10] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
    Sedaghati, Naser
    Ashari, Arash
    Pouchet, Louis-Noel
    Parthasarathy, Srinivasan
    Sadayappan, P.
    [J]. 2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24