A MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU

被引：56

作者：

Dziekonski, A. ^{[1
]}

Lamecki, A. ^{[1
]}

Mrozowski, M. ^{[1
]}

机构：

[1] Gdansk Univ Technol GUT, Fac Elect Telecommun & Informat ETI, WiComm Ctr Excellence, PL-80233 Gdansk, Poland

来源：

PROGRESS IN ELECTROMAGNETICS RESEARCH-PIER | 2011年 / 116卷

关键词：

FINITE-ELEMENT-METHOD; FDTD METHOD; SCATTERING; ALGORITHM; UNITS;

D O I：

10.2528/PIER11031607

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a new sparse matrix storage format which allows an efficient implementation of a sparse matrix vector product on a Fermi Graphics Processing Unit (GPU). Unlike previous formats it has both low memory footprint and good throughput. The new format, which we call Sliced ELLR-T has been designed specifically for accelerating the iterative solution of a large sparse and complex-valued system of linear equations arising in computational electromagnetics. Numerical tests have shown that the performance of the new implementation reaches 69 GFLOPS in complex single precision arithmetic. Compared to the optimized six core Central Processing Unit (CPU) (Intel Xeon 5680) this performance implies a speedup by a factor of six. In terms of speed the new format is as fast as the best format published so far and at the same time it does not introduce redundant zero elements which have to be stored to ensure fast memory access. Compared to previously published solutions, significantly larger problems can be handled using low cost commodity GPUs with limited amount of on-board memory.

引用

页码：49 / 63

页数：15

共 50 条

[1] Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU
Yang, Carl
Wang, Yangzihao
Owens, John D.
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 841 - 847
[2] A Fast Sparse Block Circulant Matrix Vector Product
Romero, Eloy
Tomas, Andres
Soriano, Antonio
Blanquer, Ignacio
EURO-PAR 2014 PARALLEL PROCESSING, 2014, 8632 : 548 - 559
[3] Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
Gao, Jiaquan
Qi, Panpan
He, Guixia
MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[4] A GPU Framework for Sparse Matrix Vector Multiplication
Neelima, B.
Reddy, G. Ram Mohana
Raghavendra, Prakash S.
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 51 - 58
[5] Performance evaluation of sparse matrix-vector product (SpMV) computation on GPU architecture
Kasmi, Najlae
Mahmoudi, Sidi Ahmed
Zbakh, Mostapha
Manneback, Pierre
2014 SECOND WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2014, : 23 - 27
[6] An Efficient Sparse Matrix Multiplication for skewed matrix on GPU
Shah, Monika
Patel, Vibha
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1301 - 1306
[7] Efficient Sparse-Matrix Multi-Vector Product on GPUs
Hong, Changwan
Sukumaran-Rajam, Aravind
Bandyopadhyay, Bortik
Kim, Jinsung
Kurt, Sureyya Emre
Nisa, Israt
Sabhlok, Shivani
Catalyurek, Umit V.
Parthasarathy, Srinivasan
Sadayappan, P.
HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2018, : 66 - 79
[8] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Tao, Yuan
Deng, Yangdong
Mu, Shuai
Zhang, Zhenzhong
Zhu, Mingfa
Xiao, Limin
Ruan, Li
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
[9] Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware
El Zein, Ahmed H.
Rendell, Alistair P.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (01): : 3 - 13
[10] Improving the locality of the sparse matrix-vector product on shared memory multiprocessors
Pichel, JC
Heras, DB
Cabaleiro, JC
Rivera, FF
12TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2004, : 66 - 71

← 1 2 3 4 5 →