Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA

被引：7

作者：

Xu, Shiming ^{[1
]}

Lin, Hai Xiang ^{[1
]}

Xue, Wei ^{[2
]}

机构：

[1] Delft Univ Technol, Delft Inst Appl Math, Delft, Netherlands

[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES 2010) | 2010年

关键词：

SpMV; GP-GPU; NVIDIA CUDA; RCM;

D O I：

10.1109/DCABES.2010.162

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In this paper we propose the optimization of sparse matrix-vector multiplication (SpMV) with CUDA based on matrix bandwidth/profile reduction techniques. Computational time required to access dense vector is decoupled from SpMV computation. By reducing the matrix profile, the time required to access dense vector is reduced by 17% (for SP) and 24% (for DP). Reduced matrix bandwidth enables column index information compression with shorter formats, resulting in a 17% (for SP) and 10% (for DP) execution time reduction for accessing matrix data under ELLPACK format. The overall speedup for SpMV is 16% and 12.6% for the whole matrix test suite. The optimization proposed in this paper can be combined with other SpMV optimizations such as register blocking.

引用

下载

页码：609 / 614

页数：6

共 50 条

[31] Efficient FCM Computations Using Sparse Matrix-Vector Multiplication
Puheim, Michal
Vascak, Jan
Machova, Kristina
2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4165 - 4170
[32] On sparse matrix-vector multiplication with FPGA-based system
ElGindy, H
Shue, YL
10TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2002, : 273 - 274
[33] Merge-based Parallel Sparse Matrix-Vector Multiplication
Merrill, Duane
Garland, Michael
SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 678 - 689
[34] CACHE-OBLIVIOUS SPARSE MATRIX-VECTOR MULTIPLICATION BY USING SPARSE MATRIX PARTITIONING METHODS
Yzelman, A. N.
Bisseling, Rob H.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2009, 31 (04): : 3128 - 3154
[35] An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs
Sun, Song
Monga, Madhu
Jones, Phillip H.
Zambreno, Joseph
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2012, 59 (01) : 113 - 123
[36] Communication balancing in parallel sparse matrix-vector multiplication
Bisseling, RH
Meesen, W
ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
[37] Sparse matrix-vector multiplication on network-on-chip
Sun, C-C
Goetze, J.
Jheng, H-Y
Ruan, S-J
ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
[38] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
Zhang, Jilin
Liu, Enyi
Wan, Jian
Ren, Yongjian
Yue, Miao
Wang, Jue
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482
[39] Energy Evaluation of Sparse Matrix-Vector Multiplication on GPU
Benatia, Akrem
Ji, Weixing
Wang, Yizhuo
Shi, Feng
2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
[40] Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
Yilmaz, Buse
Aktemur, Baris
Garzaran, Maria J.
Kamin, Sam
Kirac, Furkan
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (01)

← 1 2 3 4 5 →