A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS

被引：149

作者：

Kreutzer, Moritz ^{[1
]}

Hager, Georg ^{[1
]}

Wellein, Gerhard ^{[1
]}

Fehske, Holger ^{[2
]}

Bishop, Alan R. ^{[3
]}

机构：

[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, D-91058 Erlangen, Germany

[2] Ernst Moritz Arndt Univ Greifswald, Inst Phys, D-17489 Greifswald, Germany

[3] Los Alamos Natl Lab, Theory Simulat & Computat Directorate, Los Alamos, NM 87545 USA

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2014年 / 36卷 / 05期

关键词：

sparse matrix; sparse matrix-vector multiplication; data format; performance model; SIMD; PERFORMANCE;

D O I：

10.1137/130930352

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi-and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELLC-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.

引用

页码：C401 / C423

页数：23

共 50 条

[41] An Efficient Sparse Matrix-Vector Multiplication on Distributed Memory Parallel Computers
Shahnaz, Rukhsana
Usman, Anila
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (01): : 77 - 82
[42] Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
Gao, Jiaquan
Qi, Panpan
He, Guixia
MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
[43] CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
Liu, Weifeng
Vinter, Brian
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 339 - 350
[44] AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs
Maggioni, Marco
Berger-Wolf, Tanya
2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 11 - 20
[45] Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks
Buluc, Aydin
Fineman, Jeremy T.
Frigo, Matteo
Gilbert, John R.
Leiserson, Charles E.
SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 233 - 244
[46] Sparse Matrix-Vector Multiplication: A Data Mapping-Based Architecture
Mansour, Ahmad
Goetze, Juergen
Hsu, Wei-Chun
Ruan, Shanq-Jang
2014 15TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2014), 2014, : 152 - 158
[47] An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units
Abu-Sufah, Walid
Karim, Asma Abdel
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 453 - 460
[48] Communication balancing in parallel sparse matrix-vector multiplication
Bisseling, RH
Meesen, W
ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
[49] Sparse matrix-vector multiplication on network-on-chip
Sun, C-C
Goetze, J.
Jheng, H-Y
Ruan, S-J
ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
[50] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
Zhang, Jilin
Liu, Enyi
Wan, Jian
Ren, Yongjian
Yue, Miao
Wang, Jue
APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482

← 1 2 3 4 5 →