A MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU

被引：56

作者：

Dziekonski, A. ^{[1
]}

Lamecki, A. ^{[1
]}

Mrozowski, M. ^{[1
]}

机构：

[1] Gdansk Univ Technol GUT, Fac Elect Telecommun & Informat ETI, WiComm Ctr Excellence, PL-80233 Gdansk, Poland

来源：

PROGRESS IN ELECTROMAGNETICS RESEARCH-PIER | 2011年 / 116卷

关键词：

FINITE-ELEMENT-METHOD; FDTD METHOD; SCATTERING; ALGORITHM; UNITS;

D O I：

10.2528/PIER11031607

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.

引用

页码：49 / 63

页数：15

共 50 条

[21] Performance improvement of sparse matrix vector product on vector machines
Tiyyagura, Sunil R.
Kuester, Uwe
Borowski, Stefan
COMPUTATIONAL SCIENCE - ICCS 2006, PT 1, PROCEEDINGS, 2006, 3991 : 196 - 203
[22] Leveraging Memory Copy Overlap for Efficient Sparse Matrix-Vector Multiplication on GPUs
Zeng, Guangsen
Zou, Yi
ELECTRONICS, 2023, 12 (17)
[23] SparseP: Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
Giannoula, Christina
Fernandez, Ivan
Gomez-Luna, Juan
Koziris, Nectarios
Goumas, Georgios
Mutlu, Onur
2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 288 - 291
[24] Efficient dense matrix-vector multiplication on GPU
He, Guixia
Gao, Jiaquan
Wang, Jun
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19):
[25] Reducing Vector I/O for Faster GPU Sparse Matrix-Vector Multiplication
Nguyen Quang Anh Pham
Fan, Rui
Wen, Yonggang
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 1043 - 1052
[26] Block strategy and adaptive storage for sparse matrix–vector multiplication on GPU
Zhixiang Zhao
Yanxia Wu
Guoyin Zhang
Yiqing Yang
Haibo Liu
Cluster Computing, 2025, 28 (5)
[27] VCSR: An Efficient GPU Memory-Aware Sparse Format
Karimi, Elmira
Agostini, Nicolas Bohm
Dong, Shi
Kaeli, David
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3977 - 3989
[28] Recursive Hybrid Compression for Sparse Matrix-Vector Multiplication on GPU
Zhao, Zhixiang
Wu, Yanxia
Zhang, Guoyin
Yang, Yiqing
Hong, Ruize
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2025, 37 (4-5):
[29] Adaptive sparse matrix representation for efficient matrix–vector multiplication
Pantea Zardoshti
Farshad Khunjush
Hamid Sarbazi-Azad
The Journal of Supercomputing, 2016, 72 : 3366 - 3386
[30] Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Monakov, Alexander
Lokhmotov, Anton
Avetisyan, Arutyun
HIGH PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS, PROCEEDINGS, 2010, 5952 : 111 - +

← 1 2 3 4 5 →