Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication

被引：5

作者：

Nguyen Quang Anh Pham ^{[1
]}

Fan, Rui ^{[1
]}

Wen, Yonggang ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore

来源：

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2015年

关键词：

D O I：

10.1109/IPDPS.2015.100

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse matrix-vector multiplication (SpMV) is an important kernel used in solving many scientific and engineering problems. The massive parallelism of graphics processing units (GPUs) makes them well suited for SpMV computations. However, fully utilizing the power of GPUs is challenging because SpMV makes a large number of scattered memory accesses which saturate the GPU's memory bandwidth. Most previous works sought to address the bandwidth limitation by using efficient storage formats for the matrix. However, we show that for most matrices, a majority of the bandwidth is consumed by accesses to the vector. In this paper, we introduce two techniques to significantly decrease the I/O for vector accesses, by making novel use of the GPU's fast shared memory. A key advantage of our vector optimizations is that they are complementary to existing matrix I/O optimizations, so that it is possible to use both techniques in conjunction. Furthermore, combining the optimizations requires only minor code changes. We demonstrate how to combine our techniques with the widely used CUSP SpMV algorithm and the currently highest performing yaSpMV algorithm to significantly improve both algorithms' performance. We experimented with a wide range of matrices, and show that the modified version of CUSP on average reduces vector I/O by 37% and reduces the total I/O by 31%, while the modified version of yaSpMV reduces the vector and total I/O by 36% and 31%, resp. We improve CUSP's total throughput by 14% on average and up to 77% for certain matrices, and improve yaSpMV's throughput by 12% on average and 35% for some matrices.

引用

页码：1043 / 1052

页数：10

共 50 条

[41] No Zero Padded Sparse Matrix-Vector Multiplication on FPGAs
Huang, Jiasen
Ren, Junyan
Yin, Wenbo
Wang, Lingli
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2014, : 290 - 291
[42] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
Dubois, David
Dubois, Andrew
Boorman, Thomas
Connor, Carolyn
Poole, Steve
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (01)
[43] Sparse Binary Matrix-Vector Multiplication on Neuromorphic Computers
Schuman, Catherine D.
Kay, Bill
Date, Prasanna
Kannan, Ramakrishnan
Sao, Piyush
Potok, Thomas E.
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 308 - 311
[44] Optimization techniques for sparse matrix-vector multiplication on GPUs
Maggioni, Marco
Berger-Wolf, Tanya
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2016, 93-94 : 66 - 86
[45] LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows
Liu, Yongchao
Schmidt, Bertil
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (01): : 69 - 86
[46] Acceleration of Sparse Matrix-Vector Multiplication by Region Traversal
Simecek, I.
ACTA POLYTECHNICA, 2008, 48 (04) : 8 - 15
[47] IMAGE EDITING BASED ON SPARSE MATRIX-VECTOR MULTIPLICATION
Wang, Ying
Yan, Hongping
Pan, Chunhong
Xiang, Shiming
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 1317 - 1320
[48] Processor-efficient sparse matrix-vector multiplication
Heath, LS
Ribbens, CJ
Pemmaraju, SV
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2004, 48 (3-4) : 589 - 608
[49] High performance sparse matrix-vector multiplication on FPGA
Zou, Dan
Dou, Yong
Guo, Song
Ni, Shice
IEICE ELECTRONICS EXPRESS, 2013, 10 (17):
[50] LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows
Yongchao Liu
Bertil Schmidt
Journal of Signal Processing Systems, 2018, 90 : 69 - 86

← 1 2 3 4 5 →