A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS

被引：149

作者：

Kreutzer, Moritz ^{[1
]}

Hager, Georg ^{[1
]}

Wellein, Gerhard ^{[1
]}

Fehske, Holger ^{[2
]}

Bishop, Alan R. ^{[3
]}

机构：

[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, D-91058 Erlangen, Germany

[2] Ernst Moritz Arndt Univ Greifswald, Inst Phys, D-17489 Greifswald, Germany

[3] Los Alamos Natl Lab, Theory Simulat & Computat Directorate, Los Alamos, NM 87545 USA

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2014年 / 36卷 / 05期

关键词：

sparse matrix; sparse matrix-vector multiplication; data format; performance model; SIMD; PERFORMANCE;

D O I：

10.1137/130930352

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi-and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELLC-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.

引用

页码：C401 / C423

页数：23

共 50 条

[21] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
Georgios Goumas
Kornilios Kourtis
Nikos Anastopoulos
Vasileios Karakasis
Nectarios Koziris
The Journal of Supercomputing, 2009, 50 : 36 - 77
[22] Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs
Yoshizawa, Hiroki
Takahashi, Daisuke
15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012), 2012, : 130 - 136
[23] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
Goumas, Georgios
Kourtis, Kornilios
Anastopoulos, Nikos
Karakasis, Vasileios
Koziris, Nectarios
JOURNAL OF SUPERCOMPUTING, 2009, 50 (01): : 36 - 77
[24] Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture
Aananthakrishnan, Sriram
Pawlowski, Robert
Fryman, Joshua
Hur, Ibrahim
2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[25] HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors
Li, Wenxuan
Cheng, Helin
Lu, Zhengyang
Lu, Yuechen
Liu, Weifeng
2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 209 - 220
[26] Efficient FCM Computations Using Sparse Matrix-Vector Multiplication
Puheim, Michal
Vascak, Jan
Machova, Kristina
2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4165 - 4170
[27] Efficient Multicore Sparse Matrix-Vector Multiplication for FE Electromagnetics
Fernandez, David M.
Giannacopoulos, Dennis
Gross, Warren J.
IEEE TRANSACTIONS ON MAGNETICS, 2009, 45 (03) : 1392 - 1395
[28] Efficient sparse matrix-vector multiplication using cache oblivious extension quadtree storage format
Zhang, Jilin
Wan, Jian
Li, Fangfang
Mao, Jie
Zhuang, Li
Yuan, Junfeng
Liu, Enyi
Yu, Zhuoer
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 54 : 490 - 500
[29] Node aware sparse matrix-vector multiplication
Bienz, Amanda
Gropp, William D.
Olson, Luke N.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
[30] Understanding the performance of sparse matrix-vector multiplication
Goumas, Georgios
Kourtis, Kornilios
Anastopoulos, Nikos
Karakasis, Vasileios
Koziris, Nectarios
PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 283 - +

← 1 2 3 4 5 →