Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA

被引：7

作者：

Xu, Shiming ^{[1
]}

Lin, Hai Xiang ^{[1
]}

Xue, Wei ^{[2
]}

机构：

[1] Delft Univ Technol, Delft Inst Appl Math, Delft, Netherlands

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010) | 2010年

关键词：

SpMV; GP-GPU; NVIDIA CUDA; RCM;

D O I：

10.1109/DCABES.2010.162

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper we propose the optimization of sparse matrix-vector multiplication (SpMV) with CUDA based on matrix bandwidth/profile reduction techniques. Computational time required to access dense vector is decoupled from SpMV computation. By reducing the matrix profile, the time required to access dense vector is reduced by 17% (for SP) and 24% (for DP). Reduced matrix bandwidth enables column index information compression with shorter formats, resulting in a 17% (for SP) and 10% (for DP) execution time reduction for accessing matrix data under ELLPACK format. The overall speedup for SpMV is 16% and 12.6% for the whole matrix test suite. The optimization proposed in this paper can be combined with other SpMV optimizations such as register blocking.

引用

下载

页码：609 / 614

页数：6

共 50 条

[1] Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform
Shiming Xu
Wei Xue
Hai Xiang Lin
The Journal of Supercomputing, 2013, 63 : 710 - 721
[2] Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform
Xu, Shiming
Xue, Wei
Lin, Hai Xiang
JOURNAL OF SUPERCOMPUTING, 2013, 63 (03): : 710 - 721
[3] A segment-based sparse matrix-vector multiplication on CUDA
Feng, Xiaowen
Jin, Hai
Zheng, Ran
Shao, Zhiyuan
Zhu, Lei
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (01): : 271 - 286
[4] Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
Monakov, Alexander
Avetisyan, Arutyun
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2009, 5657 : 289 - 297
[5] CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
Dang, Hoang-Vu
Schmidt, Bertil
PARALLEL COMPUTING, 2013, 39 (11) : 737 - 750
[6] DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE
Fujimoto, Noriyuki
PARALLEL PROCESSING LETTERS, 2008, 18 (04) : 511 - 530
[7] Shuffle Reduction Based Sparse Matrix-Vector Multiplication on Kepler GPU
Yuan Tao
Huang Zhi-Bin
INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (10): : 99 - 106
[8] LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows
Liu, Yongchao
Schmidt, Bertil
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (01): : 69 - 86
[9] LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows
Yongchao Liu
Bertil Schmidt
Journal of Signal Processing Systems, 2018, 90 : 69 - 86
[10] Sparse Matrix-Vector Multiplication on GPGPUs
Filippone, Salvatore
Cardellini, Valeria
Barbieri, Davide
Fanfarillo, Alessandro
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):

← 1 2 3 4 5 →