LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

被引：0

作者：

Yongchao Liu

Bertil Schmidt

机构：

[1] Georgia Institute of Technology,School of Computational Science & Engineering

[2] Johannes Gutenberg University Mainz,Institute of Computer Science

来源：

Journal of Signal Processing Systems | 2018年 / 90卷

关键词：

Sparse matrix-vector multiplication; Compressed sparse row; CUDA; GPU;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Compressed sparse row (CSR) is one of the most frequently used sparse matrix storage formats. However, the efficiency of existing CUDA-compatible CSR-based sparse matrix vector multiplication (SpMV) implementations is relatively low. We address this issue by presenting LightSpMV, a parallelized CSR-based SpMV implementation programmed in CUDA C++. This algorithm achieves high speed by employing atomic and warp shuffle instructions to implement fine-grained dynamic distribution of matrix rows over vectors/warps as well as efficient vector dot product computation. Moreover, we propose a unified cache hit rate computation approach to consistently investigate the caching behavior for different SpMV kernels, which may have different data deployment in the hierarchical memory space of CUDA-enabled GPUs. We have assessed LightSpMV using a set of sparse matrices and further compared it to the CSR-based SpMV kernels in the top-performing CUSP, ViennaCL and cuSPARSE libraries. Our experimental results demonstrate that LightSpMV is superior to CUSP, ViennaCL and cuSPARSE on the same Kepler-based Tesla K40c GPU, running up to 2.63× and 2.65× faster than CUSP, up to 2.52× and 1.96× faster than ViennaCL, and up to 1.94× and 1.79× faster than cuSPARSE with respect to single and double precision, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts. LightSpMV is open-source and publicly available at http://lightspmv.sourceforge.net.

引用

页码：69 / 86

页数：17

共 50 条

[21] The Sliced COO format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs
Dang, Hoang-Vu
Schmidt, Bertil
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 57 - 66
[22] Sparse matrix-vector multiplication -: Final solution?
Simecek, Ivan
Tvrdik, Pavel
PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 156 - 165
[23] On improving the performance of sparse matrix-vector multiplication
White, JB
Sadayappan, P
FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 66 - 71
[24] CACHE-OBLIVIOUS SPARSE MATRIX-VECTOR MULTIPLICATION BY USING SPARSE MATRIX PARTITIONING METHODS
Yzelman, A. N.
Bisseling, Rob H.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2009, 31 (04): : 3128 - 3154
[25] Efficient FCM Computations Using Sparse Matrix-Vector Multiplication
Puheim, Michal
Vascak, Jan
Machova, Kristina
2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4165 - 4170
[26] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
Zhang, Jilin
Liu, Enyi
Wan, Jian
Ren, Yongjian
Yue, Miao
Wang, Jue
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482
[27] Adaptive sparse matrix representation for efficient matrix-vector multiplication
Zardoshti, Pantea
Khunjush, Farshad
Sarbazi-Azad, Hamid
JOURNAL OF SUPERCOMPUTING, 2016, 72 (09): : 3366 - 3386
[28] Communication balancing in parallel sparse matrix-vector multiplication
Bisseling, RH
Meesen, W
ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
[29] Sparse matrix-vector multiplication on network-on-chip
Sun, C-C
Goetze, J.
Jheng, H-Y
Ruan, S-J
ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
[30] Optimization by Runtime Specialization for Sparse Matrix-Vector Multiplication
Kamin, Sam
Garzaran, Maria Jesus
Aktemur, Baris
Xu, Danqing
Yilmaz, Buse
Chen, Zhongbo
ACM SIGPLAN NOTICES, 2015, 50 (03) : 93 - 102

← 1 2 3 4 5 →