LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

被引：0

作者：

Yongchao Liu

Bertil Schmidt

机构：

[1] Georgia Institute of Technology,School of Computational Science & Engineering

[2] Johannes Gutenberg University Mainz,Institute of Computer Science

来源：

Journal of Signal Processing Systems | 2018年 / 90卷

关键词：

Sparse matrix-vector multiplication; Compressed sparse row; CUDA; GPU;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Compressed sparse row (CSR) is one of the most frequently used sparse matrix storage formats. However, the efficiency of existing CUDA-compatible CSR-based sparse matrix vector multiplication (SpMV) implementations is relatively low. We address this issue by presenting LightSpMV, a parallelized CSR-based SpMV implementation programmed in CUDA C++. This algorithm achieves high speed by employing atomic and warp shuffle instructions to implement fine-grained dynamic distribution of matrix rows over vectors/warps as well as efficient vector dot product computation. Moreover, we propose a unified cache hit rate computation approach to consistently investigate the caching behavior for different SpMV kernels, which may have different data deployment in the hierarchical memory space of CUDA-enabled GPUs. We have assessed LightSpMV using a set of sparse matrices and further compared it to the CSR-based SpMV kernels in the top-performing CUSP, ViennaCL and cuSPARSE libraries. Our experimental results demonstrate that LightSpMV is superior to CUSP, ViennaCL and cuSPARSE on the same Kepler-based Tesla K40c GPU, running up to 2.63× and 2.65× faster than CUSP, up to 2.52× and 1.96× faster than ViennaCL, and up to 1.94× and 1.79× faster than cuSPARSE with respect to single and double precision, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts. LightSpMV is open-source and publicly available at http://lightspmv.sourceforge.net.

引用

页码：69 / 86

页数：17

共 50 条

[1] LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows
Liu, Yongchao
Schmidt, Bertil
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (01): : 69 - 86
[2] LightSpMV: Faster CSR-based Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs
Liu, Yongchao
Schmidt, Bertil
PROCEEDINGS OF THE ASAP2015 2015 IEEE 26TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2015, : 82 - 89
[3] Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks
Buluc, Aydin
Fineman, Jeremy T.
Frigo, Matteo
Gilbert, John R.
Leiserson, Charles E.
SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 233 - 244
[4] A segment-based sparse matrix-vector multiplication on CUDA
Feng, Xiaowen
Jin, Hai
Zheng, Ran
Shao, Zhiyuan
Zhu, Lei
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (01): : 271 - 286
[5] Heterogeneous sparse matrix-vector multiplication via compressed sparse row format
Lane, Phillip Allen
Booth, Joshua Dennis
PARALLEL COMPUTING, 2023, 115
[6] Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA
Xu, Shiming
Lin, Hai Xiang
Xue, Wei
PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010), 2010, : 609 - 614
[7] CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
Dang, Hoang-Vu
Schmidt, Bertil
PARALLEL COMPUTING, 2013, 39 (11) : 737 - 750
[8] Sparse Matrix-Vector Multiplication on GPGPUs
Filippone, Salvatore
Cardellini, Valeria
Barbieri, Davide
Fanfarillo, Alessandro
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
[9] Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication
Nguyen Quang Anh Pham
Fan, Rui
Wen, Yonggang
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 1043 - 1052
[10] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Tao, Yuan
Deng, Yangdong
Mu, Shuai
Zhang, Zhenzhong
Zhu, Mingfa
Xiao, Limin
Ruan, Li
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789

← 1 2 3 4 5 →