Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA

被引:7
|
作者
Xu, Shiming [1 ]
Lin, Hai Xiang [1 ]
Xue, Wei [2 ]
机构
[1] Delft Univ Technol, Delft Inst Appl Math, Delft, Netherlands
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
关键词
SpMV; GP-GPU; NVIDIA CUDA; RCM;
D O I
10.1109/DCABES.2010.162
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we propose the optimization of sparse matrix-vector multiplication (SpMV) with CUDA based on matrix bandwidth/profile reduction techniques. Computational time required to access dense vector is decoupled from SpMV computation. By reducing the matrix profile, the time required to access dense vector is reduced by 17% (for SP) and 24% (for DP). Reduced matrix bandwidth enables column index information compression with shorter formats, resulting in a 17% (for SP) and 10% (for DP) execution time reduction for accessing matrix data under ELLPACK format. The overall speedup for SpMV is 16% and 12.6% for the whole matrix test suite. The optimization proposed in this paper can be combined with other SpMV optimizations such as register blocking.
引用
下载
收藏
页码:609 / 614
页数:6
相关论文
共 50 条
  • [31] Efficient FCM Computations Using Sparse Matrix-Vector Multiplication
    Puheim, Michal
    Vascak, Jan
    Machova, Kristina
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4165 - 4170
  • [32] On sparse matrix-vector multiplication with FPGA-based system
    ElGindy, H
    Shue, YL
    10TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2002, : 273 - 274
  • [33] Merge-based Parallel Sparse Matrix-Vector Multiplication
    Merrill, Duane
    Garland, Michael
    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 678 - 689
  • [34] CACHE-OBLIVIOUS SPARSE MATRIX-VECTOR MULTIPLICATION BY USING SPARSE MATRIX PARTITIONING METHODS
    Yzelman, A. N.
    Bisseling, Rob H.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2009, 31 (04): : 3128 - 3154
  • [35] An I/O Bandwidth-Sensitive Sparse Matrix-Vector Multiplication Engine on FPGAs
    Sun, Song
    Monga, Madhu
    Jones, Phillip H.
    Zambreno, Joseph
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2012, 59 (01) : 113 - 123
  • [36] Communication balancing in parallel sparse matrix-vector multiplication
    Bisseling, RH
    Meesen, W
    ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
  • [37] Sparse matrix-vector multiplication on network-on-chip
    Sun, C-C
    Goetze, J.
    Jheng, H-Y
    Ruan, S-J
    ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
  • [38] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
    Zhang, Jilin
    Liu, Enyi
    Wan, Jian
    Ren, Yongjian
    Yue, Miao
    Wang, Jue
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482
  • [39] Energy Evaluation of Sparse Matrix-Vector Multiplication on GPU
    Benatia, Akrem
    Ji, Weixing
    Wang, Yizhuo
    Shi, Feng
    2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
  • [40] Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
    Yilmaz, Buse
    Aktemur, Baris
    Garzaran, Maria J.
    Kamin, Sam
    Kirac, Furkan
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 13 (01)