LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

被引:0
|
作者
Yongchao Liu
Bertil Schmidt
机构
[1] Georgia Institute of Technology,School of Computational Science & Engineering
[2] Johannes Gutenberg University Mainz,Institute of Computer Science
来源
关键词
Sparse matrix-vector multiplication; Compressed sparse row; CUDA; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
Compressed sparse row (CSR) is one of the most frequently used sparse matrix storage formats. However, the efficiency of existing CUDA-compatible CSR-based sparse matrix vector multiplication (SpMV) implementations is relatively low. We address this issue by presenting LightSpMV, a parallelized CSR-based SpMV implementation programmed in CUDA C++. This algorithm achieves high speed by employing atomic and warp shuffle instructions to implement fine-grained dynamic distribution of matrix rows over vectors/warps as well as efficient vector dot product computation. Moreover, we propose a unified cache hit rate computation approach to consistently investigate the caching behavior for different SpMV kernels, which may have different data deployment in the hierarchical memory space of CUDA-enabled GPUs. We have assessed LightSpMV using a set of sparse matrices and further compared it to the CSR-based SpMV kernels in the top-performing CUSP, ViennaCL and cuSPARSE libraries. Our experimental results demonstrate that LightSpMV is superior to CUSP, ViennaCL and cuSPARSE on the same Kepler-based Tesla K40c GPU, running up to 2.63× and 2.65× faster than CUSP, up to 2.52× and 1.96× faster than ViennaCL, and up to 1.94× and 1.79× faster than cuSPARSE with respect to single and double precision, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts. LightSpMV is open-source and publicly available at http://lightspmv.sourceforge.net.
引用
收藏
页码:69 / 86
页数:17
相关论文
共 50 条
  • [21] The Sliced COO format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs
    Dang, Hoang-Vu
    Schmidt, Bertil
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 57 - 66
  • [22] Sparse matrix-vector multiplication -: Final solution?
    Simecek, Ivan
    Tvrdik, Pavel
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 156 - 165
  • [23] On improving the performance of sparse matrix-vector multiplication
    White, JB
    Sadayappan, P
    FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 66 - 71
  • [24] CACHE-OBLIVIOUS SPARSE MATRIX-VECTOR MULTIPLICATION BY USING SPARSE MATRIX PARTITIONING METHODS
    Yzelman, A. N.
    Bisseling, Rob H.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2009, 31 (04): : 3128 - 3154
  • [25] Efficient FCM Computations Using Sparse Matrix-Vector Multiplication
    Puheim, Michal
    Vascak, Jan
    Machova, Kristina
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4165 - 4170
  • [26] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
    Zhang, Jilin
    Liu, Enyi
    Wan, Jian
    Ren, Yongjian
    Yue, Miao
    Wang, Jue
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482
  • [27] Adaptive sparse matrix representation for efficient matrix-vector multiplication
    Zardoshti, Pantea
    Khunjush, Farshad
    Sarbazi-Azad, Hamid
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (09): : 3366 - 3386
  • [28] Communication balancing in parallel sparse matrix-vector multiplication
    Bisseling, RH
    Meesen, W
    ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
  • [29] Sparse matrix-vector multiplication on network-on-chip
    Sun, C-C
    Goetze, J.
    Jheng, H-Y
    Ruan, S-J
    ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
  • [30] Optimization by Runtime Specialization for Sparse Matrix-Vector Multiplication
    Kamin, Sam
    Garzaran, Maria Jesus
    Aktemur, Baris
    Xu, Danqing
    Yilmaz, Buse
    Chen, Zhongbo
    ACM SIGPLAN NOTICES, 2015, 50 (03) : 93 - 102