A MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU

被引:56
|
作者
Dziekonski, A. [1 ]
Lamecki, A. [1 ]
Mrozowski, M. [1 ]
机构
[1] Gdansk Univ Technol GUT, Fac Elect Telecommun & Informat ETI, WiComm Ctr Excellence, PL-80233 Gdansk, Poland
关键词
FINITE-ELEMENT-METHOD; FDTD METHOD; SCATTERING; ALGORITHM; UNITS;
D O I
10.2528/PIER11031607
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.
引用
收藏
页码:49 / 63
页数:15
相关论文
共 50 条
  • [21] Performance improvement of sparse matrix vector product on vector machines
    Tiyyagura, Sunil R.
    Kuester, Uwe
    Borowski, Stefan
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 1, PROCEEDINGS, 2006, 3991 : 196 - 203
  • [22] Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs
    Zeng, Guangsen
    Zou, Yi
    ELECTRONICS, 2023, 12 (17)
  • [23] SparseP: Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
    Giannoula, Christina
    Fernandez, Ivan
    Gomez-Luna, Juan
    Koziris, Nectarios
    Goumas, Georgios
    Mutlu, Onur
    2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 288 - 291
  • [24] Efficient dense matrix-vector multiplication on GPU
    He, Guixia
    Gao, Jiaquan
    Wang, Jun
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19):
  • [25] Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication
    Nguyen Quang Anh Pham
    Fan, Rui
    Wen, Yonggang
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 1043 - 1052
  • [26] Block strategy and adaptive storage for sparse matrix–vector multiplication on GPU
    Zhixiang Zhao
    Yanxia Wu
    Guoyin Zhang
    Yiqing Yang
    Haibo Liu
    Cluster Computing, 2025, 28 (5)
  • [27] VCSR: An Efficient GPU Memory-Aware Sparse Format
    Karimi, Elmira
    Agostini, Nicolas Bohm
    Dong, Shi
    Kaeli, David
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3977 - 3989
  • [28] Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU
    Zhao, Zhixiang
    Wu, Yanxia
    Zhang, Guoyin
    Yang, Yiqing
    Hong, Ruize
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2025, 37 (4-5):
  • [29] Adaptive sparse matrix representation for efficient matrix–vector multiplication
    Pantea Zardoshti
    Farshad Khunjush
    Hamid Sarbazi-Azad
    The Journal of Supercomputing, 2016, 72 : 3366 - 3386
  • [30] Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
    Monakov, Alexander
    Lokhmotov, Anton
    Avetisyan, Arutyun
    HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 111 - +