A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS

被引：149

作者：

Kreutzer, Moritz ^{[1
]}

Hager, Georg ^{[1
]}

Wellein, Gerhard ^{[1
]}

Fehske, Holger ^{[2
]}

Bishop, Alan R. ^{[3
]}

机构：

[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, D-91058 Erlangen, Germany

[2] Ernst Moritz Arndt Univ Greifswald, Inst Phys, D-17489 Greifswald, Germany

[3] Los Alamos Natl Lab, Theory Simulat & Computat Directorate, Los Alamos, NM 87545 USA

来源：

SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2014年 / 36卷 / 05期

关键词：

sparse matrix; sparse matrix-vector multiplication; data format; performance model; SIMD; PERFORMANCE;

D O I：

10.1137/130930352

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi-and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELLC-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.

引用

页码：C401 / C423

页数：23

共 50 条

[31] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
DuBois, David
DuBois, Andrew
Connor, Carolyn
Poole, Steve
PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 239 - +
[32] Sparse matrix-vector multiplication design on FPGAs
Sun, Junqing
Peterson, Gregory
Storaasli, Olaf
FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2007, : 349 - +
[33] STRUCTURED SPARSE MATRIX-VECTOR MULTIPLICATION ON A MASPAR
DEHN, T
EIERMANN, M
GIEBERMANN, K
SPERLING, V
ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1994, 74 (06): : T534 - T538
[34] Performance Aspects of Sparse Matrix-Vector Multiplication
Simecek, I.
ACTA POLYTECHNICA, 2006, 46 (03) : 3 - 8
[35] Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer
Huillcen Baca, Herwin Alayn
Palomino Valdivia, Flor de Luz
PROCEEDINGS OF THE 2019 IEEE XXVI INTERNATIONAL CONFERENCE ON ELECTRONICS, ELECTRICAL ENGINEERING AND COMPUTING (INTERCON), 2019,
[36] Sparse matrix-vector multiplication -: Final solution?
Simecek, Ivan
Tvrdik, Pavel
PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 156 - 165
[37] On improving the performance of sparse matrix-vector multiplication
White, JB
Sadayappan, P
FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 66 - 71
[38] Vector ISA extension for sparse matrix-vector multiplication
Vassiliadis, S
Cotofana, S
Stathis, P
EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 708 - 715
[39] Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs
Berger, Gonzalo
Dufrechou, Ernesto
Ezzatti, Pablo
EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 246 - 256
[40] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
Gao, Jianhua
Ji, Weixing
Tan, Zhaonian
Wang, Yizhuo
Shi, Feng
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745

← 1 2 3 4 5 →