Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication

被引:5
|
作者
Nguyen Quang Anh Pham [1 ]
Fan, Rui [1 ]
Wen, Yonggang [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
关键词
D O I
10.1109/IPDPS.2015.100
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix-vector multiplication (SpMV) is an important kernel used in solving many scientific and engineering problems. The massive parallelism of graphics processing units (GPUs) makes them well suited for SpMV computations. However, fully utilizing the power of GPUs is challenging because SpMV makes a large number of scattered memory accesses which saturate the GPU's memory bandwidth. Most previous works sought to address the bandwidth limitation by using efficient storage formats for the matrix. However, we show that for most matrices, a majority of the bandwidth is consumed by accesses to the vector. In this paper, we introduce two techniques to significantly decrease the I/O for vector accesses, by making novel use of the GPU's fast shared memory. A key advantage of our vector optimizations is that they are complementary to existing matrix I/O optimizations, so that it is possible to use both techniques in conjunction. Furthermore, combining the optimizations requires only minor code changes. We demonstrate how to combine our techniques with the widely used CUSP SpMV algorithm and the currently highest performing yaSpMV algorithm to significantly improve both algorithms' performance. We experimented with a wide range of matrices, and show that the modified version of CUSP on average reduces vector I/O by 37% and reduces the total I/O by 31%, while the modified version of yaSpMV reduces the vector and total I/O by 36% and 31%, resp. We improve CUSP's total throughput by 14% on average and up to 77% for certain matrices, and improve yaSpMV's throughput by 12% on average and 35% for some matrices.
引用
收藏
页码:1043 / 1052
页数:10
相关论文
共 50 条
  • [1] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
  • [2] Energy Evaluation of Sparse Matrix-Vector Multiplication on GPU
    Benatia, Akrem
    Ji, Weixing
    Wang, Yizhuo
    Shi, Feng
    2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
  • [3] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
    Zhang, Jilin
    Liu, Enyi
    Wan, Jian
    Ren, Yongjian
    Yue, Miao
    Wang, Jue
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482
  • [4] A New Method of Sparse Matrix-Vector Multiplication on GPU
    Huan, Gao
    Qian, Zhang
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 954 - 958
  • [5] Adaptive diagonal sparse matrix-vector multiplication on GPU
    Gao, Jiaquan
    Xia, Yifei
    Yin, Renjie
    He, Guixia
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 157 : 287 - 302
  • [6] Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU
    Zhao, Zhixiang
    Wu, Yanxia
    Zhang, Guoyin
    Yang, Yiqing
    Hong, Ruize
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2025, 37 (4-5):
  • [7] Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
    Monakov, Alexander
    Lokhmotov, Anton
    Avetisyan, Arutyun
    HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 111 - +
  • [8] Faster Online Matrix-Vector Multiplication
    Larsen, Kasper Green
    Williams, Ryan
    PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2017, : 2182 - 2189
  • [9] Sparse Matrix-Vector Multiplication on GPGPUs
    Filippone, Salvatore
    Cardellini, Valeria
    Barbieri, Davide
    Fanfarillo, Alessandro
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
  • [10] Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU
    Yuan Tao
    Huang Zhi-Bin
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (10): : 99 - 106