Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication

被引:5
|
作者
Nguyen Quang Anh Pham [1 ]
Fan, Rui [1 ]
Wen, Yonggang [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
关键词
D O I
10.1109/IPDPS.2015.100
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sparse matrix-vector multiplication (SpMV) is an important kernel used in solving many scientific and engineering problems. The massive parallelism of graphics processing units (GPUs) makes them well suited for SpMV computations. However, fully utilizing the power of GPUs is challenging because SpMV makes a large number of scattered memory accesses which saturate the GPU's memory bandwidth. Most previous works sought to address the bandwidth limitation by using efficient storage formats for the matrix. However, we show that for most matrices, a majority of the bandwidth is consumed by accesses to the vector. In this paper, we introduce two techniques to significantly decrease the I/O for vector accesses, by making novel use of the GPU's fast shared memory. A key advantage of our vector optimizations is that they are complementary to existing matrix I/O optimizations, so that it is possible to use both techniques in conjunction. Furthermore, combining the optimizations requires only minor code changes. We demonstrate how to combine our techniques with the widely used CUSP SpMV algorithm and the currently highest performing yaSpMV algorithm to significantly improve both algorithms' performance. We experimented with a wide range of matrices, and show that the modified version of CUSP on average reduces vector I/O by 37% and reduces the total I/O by 31%, while the modified version of yaSpMV reduces the vector and total I/O by 36% and 31%, resp. We improve CUSP's total throughput by 14% on average and up to 77% for certain matrices, and improve yaSpMV's throughput by 12% on average and 35% for some matrices.
引用
收藏
页码:1043 / 1052
页数:10
相关论文
共 50 条
  • [21] Node aware sparse matrix-vector multiplication
    Bienz, Amanda
    Gropp, William D.
    Olson, Luke N.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
  • [22] STRUCTURED SPARSE MATRIX-VECTOR MULTIPLICATION ON A MASPAR
    DEHN, T
    EIERMANN, M
    GIEBERMANN, K
    SPERLING, V
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1994, 74 (06): : T534 - T538
  • [23] Performance Aspects of Sparse Matrix-Vector Multiplication
    Simecek, I.
    ACTA POLYTECHNICA, 2006, 46 (03) : 3 - 8
  • [24] On improving the performance of sparse matrix-vector multiplication
    White, JB
    Sadayappan, P
    FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 66 - 71
  • [25] Sparse matrix-vector multiplication -: Final solution?
    Simecek, Ivan
    Tvrdik, Pavel
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 156 - 165
  • [26] An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs
    Sun, Song
    Monga, Madhu
    Jones, Phillip H.
    Zambreno, Joseph
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2012, 59 (01) : 113 - 123
  • [27] High-Performance Matrix-Vector Multiplication on the GPU
    Sorensen, Hans Henrik Brandenborg
    EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 377 - 386
  • [28] A GPU Framework for Sparse Matrix Vector Multiplication
    Neelima, B.
    Reddy, G. Ram Mohana
    Raghavendra, Prakash S.
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
  • [29] A TASK-SCHEDULING APPROACH FOR EFFICIENT SPARSE SYMMETRIC MATRIX-VECTOR MULTIPLICATION ON A GPU
    Mironowicz, P.
    Dziekonski, A.
    Mrozowski, M.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (06): : C643 - C666
  • [30] A New Segmentation-Based GPU-Accelerated Sparse Matrix-Vector Multiplication
    He, Kai
    Tan, Sheldon X-D
    Tlelo-Cuautle, Esteban
    Wang, Hai
    Tang, He
    2014 IEEE 57TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2014, : 1013 - 1016