LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

被引:0
|
作者
Yongchao Liu
Bertil Schmidt
机构
[1] Georgia Institute of Technology,School of Computational Science & Engineering
[2] Johannes Gutenberg University Mainz,Institute of Computer Science
来源
关键词
Sparse matrix-vector multiplication; Compressed sparse row; CUDA; GPU;
D O I
暂无
中图分类号
学科分类号
摘要
Compressed sparse row (CSR) is one of the most frequently used sparse matrix storage formats. However, the efficiency of existing CUDA-compatible CSR-based sparse matrix vector multiplication (SpMV) implementations is relatively low. We address this issue by presenting LightSpMV, a parallelized CSR-based SpMV implementation programmed in CUDA C++. This algorithm achieves high speed by employing atomic and warp shuffle instructions to implement fine-grained dynamic distribution of matrix rows over vectors/warps as well as efficient vector dot product computation. Moreover, we propose a unified cache hit rate computation approach to consistently investigate the caching behavior for different SpMV kernels, which may have different data deployment in the hierarchical memory space of CUDA-enabled GPUs. We have assessed LightSpMV using a set of sparse matrices and further compared it to the CSR-based SpMV kernels in the top-performing CUSP, ViennaCL and cuSPARSE libraries. Our experimental results demonstrate that LightSpMV is superior to CUSP, ViennaCL and cuSPARSE on the same Kepler-based Tesla K40c GPU, running up to 2.63× and 2.65× faster than CUSP, up to 2.52× and 1.96× faster than ViennaCL, and up to 1.94× and 1.79× faster than cuSPARSE with respect to single and double precision, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts. LightSpMV is open-source and publicly available at http://lightspmv.sourceforge.net.
引用
收藏
页码:69 / 86
页数:17
相关论文
共 50 条
  • [1] LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows
    Liu, Yongchao
    Schmidt, Bertil
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (01): : 69 - 86
  • [2] LightSpMV: Faster CSR-based Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs
    Liu, Yongchao
    Schmidt, Bertil
    PROCEEDINGS OF THE ASAP2015 2015 IEEE 26TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2015, : 82 - 89
  • [3] Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks
    Buluc, Aydin
    Fineman, Jeremy T.
    Frigo, Matteo
    Gilbert, John R.
    Leiserson, Charles E.
    SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 233 - 244
  • [4] A segment-based sparse matrix-vector multiplication on CUDA
    Feng, Xiaowen
    Jin, Hai
    Zheng, Ran
    Shao, Zhiyuan
    Zhu, Lei
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (01): : 271 - 286
  • [5] Heterogeneous sparse matrix-vector multiplication via compressed sparse row format
    Lane, Phillip Allen
    Booth, Joshua Dennis
    PARALLEL COMPUTING, 2023, 115
  • [6] Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA
    Xu, Shiming
    Lin, Hai Xiang
    Xue, Wei
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010), 2010, : 609 - 614
  • [7] CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
    Dang, Hoang-Vu
    Schmidt, Bertil
    PARALLEL COMPUTING, 2013, 39 (11) : 737 - 750
  • [8] Sparse Matrix-Vector Multiplication on GPGPUs
    Filippone, Salvatore
    Cardellini, Valeria
    Barbieri, Davide
    Fanfarillo, Alessandro
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
  • [9] Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication
    Nguyen Quang Anh Pham
    Fan, Rui
    Wen, Yonggang
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 1043 - 1052
  • [10] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789