TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs

被引：0

作者：

Ji, Haonan ^{[1
]}

Song, Huimin ^{[1
]}

Lu, Shibo ^{[2
]}

Jin, Zhou ^{[1
]}

Tan, Guangming ^{[3
]}

Liu, Weifeng ^{[1
]}

机构：

[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China

[2] Northeastern Univ, Boston, MA USA

[3] Chinese Acad Sci, State Key Lab Comp Architecture, Inst Comp Technol, Beijing, Peoples R China

来源：

51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022 | 2022年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Sparse matrix; SpMSpV; BFS; Tiling; GPU; GRAPH ALGORITHMS; LIBRARY;

D O I：

10.1145/3545008.3545028

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sparse matrix-sparse vector multiplication (SpMSpV) is an important primitive for graph algorithms and machine learning applications. The sparsity of the input and output vectors makes its floating point efficiency in general lower than sparse matrix-vector multiplication (SpMV) and sparse matrix-matrix multiplication (SpGEMM). Existing parallel SpMSpV methods focused on various row- and column-wise storage formats and merging operations. However, the data locality and sparsity pattern of the input matrix and vector are largely ignored. We in this paper propose TileSpMSpV, a tiled algorithm for accelerating SpMSpV on GPUs. Firstly, tile-wise storage structures are developed for fast positioning a group of nonzeros in matrix and vectors. Then, we develop the TileSpMSpV algorithm on top of the storage structures. In addition, to accelerate directional optimization breadth-first search (BFS) by using TileSpMSpV, we propose a TileBFS algorithm including three kernels called Push-CSC, Push-CSR and Pull-CSC. In the experiments running on a high-end NVIDIA GPU and using 2757 sparse matrices, the TileSpMSpV algorithm outperforms TileSpMV, cuSPARSE and CombBLAS by a factor of on average 1.83, 17.18 and 17.20 (up to 7.68, 1050.02 and 235.90), respectively. Moreover, our TileBFS algorithm outperforms Gunrock and GSwitch by a factor of on average 2.88 and 4.52 (up to 21.35 and 1000.85), respectively.

引用

页数：11

共 50 条

[1] TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs
Niu, Yuyao
Lu, Zhengyang
Dong, Meichen
Jin, Zhou
Liu, Weifeng
Tan, Guangming
[J]. 2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 68 - 78
[2] A work-efficient parallel sparse matrix-sparse vector multiplication algorithm
Azad, Ariful
Buluc, Aydin
[J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 688 - 697
[3] TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs
Niu, Yuyao
Lu, Zhengyang
Ji, Haonan
Song, Shuhui
Jin, Zhou
Liu, Weifeng
[J]. PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 90 - 106
[4] Merge-based Parallel Sparse Matrix-Sparse Vector Multiplication with a Vector Architecture
Li, Haoran
Yokoyama, Harumichi
Araki, Takuya
[J]. IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 43 - 50
[5] Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
Yang, Carl
Wang, Yangzihao
Owens, John D.
[J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 841 - 847
[6] Optimization techniques for sparse matrix-vector multiplication on GPUs
Maggioni, Marco
Berger-Wolf, Tanya
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
[7] On Implementing Sparse Matrix Multi-Vector Multiplication on GPUs
Abu-Sufah, Walid
Ahmad, Khalid
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1117 - 1124
[8] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
Ashari, Arash
Sedaghati, Naser
Eisenlohr, John
Parthasarathy, Srinivasan
Sadayappan, P.
[J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
[9] Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA
Nurudin Alvarez, Francisco
Antonio Ortega-Toro, Jose
Ujaldon, Manuel
[J]. HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 219 - 229
[10] Characterizing Dataset Dependence for Sparse Matrix-Vector Multiplication on GPUs
Sedaghati, Naser
Ashari, Arash
Pouchet, Louis-Noel
Parthasarathy, Srinivasan
Sadayappan, P.
[J]. 2ND WORKSHOP ON PARALLEL PROGRAMMING FOR ANALYTICS APPLICATIONS (PPAA 2015), 2015, : 17 - 24

← 1 2 3 4 5 →