A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS

被引:149
|
作者
Kreutzer, Moritz [1 ]
Hager, Georg [1 ]
Wellein, Gerhard [1 ]
Fehske, Holger [2 ]
Bishop, Alan R. [3 ]
机构
[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, D-91058 Erlangen, Germany
[2] Ernst Moritz Arndt Univ Greifswald, Inst Phys, D-17489 Greifswald, Germany
[3] Los Alamos Natl Lab, Theory Simulat & Computat Directorate, Los Alamos, NM 87545 USA
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2014年 / 36卷 / 05期
关键词
sparse matrix; sparse matrix-vector multiplication; data format; performance model; SIMD; PERFORMANCE;
D O I
10.1137/130930352
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi-and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELLC-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
引用
收藏
页码:C401 / C423
页数:23
相关论文
共 50 条
  • [31] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
    DuBois, David
    DuBois, Andrew
    Connor, Carolyn
    Poole, Steve
    PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 239 - +
  • [32] Sparse matrix-vector multiplication design on FPGAs
    Sun, Junqing
    Peterson, Gregory
    Storaasli, Olaf
    FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2007, : 349 - +
  • [33] STRUCTURED SPARSE MATRIX-VECTOR MULTIPLICATION ON A MASPAR
    DEHN, T
    EIERMANN, M
    GIEBERMANN, K
    SPERLING, V
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1994, 74 (06): : T534 - T538
  • [34] Performance Aspects of Sparse Matrix-Vector Multiplication
    Simecek, I.
    ACTA POLYTECHNICA, 2006, 46 (03) : 3 - 8
  • [35] Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer
    Huillcen Baca, Herwin Alayn
    Palomino Valdivia, Flor de Luz
    PROCEEDINGS OF THE 2019 IEEE XXVI INTERNATIONAL CONFERENCE ON ELECTRONICS, ELECTRICAL ENGINEERING AND COMPUTING (INTERCON), 2019,
  • [36] Sparse matrix-vector multiplication -: Final solution?
    Simecek, Ivan
    Tvrdik, Pavel
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 156 - 165
  • [37] On improving the performance of sparse matrix-vector multiplication
    White, JB
    Sadayappan, P
    FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 66 - 71
  • [38] Vector ISA extension for sparse matrix-vector multiplication
    Vassiliadis, S
    Cotofana, S
    Stathis, P
    EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 708 - 715
  • [39] Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs
    Berger, Gonzalo
    Dufrechou, Ernesto
    Ezzatti, Pablo
    EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 246 - 256
  • [40] TaiChi: A Hybrid Compression Format for Binary Sparse Matrix-Vector Multiplication on GPU
    Gao, Jianhua
    Ji, Weixing
    Tan, Zhaonian
    Wang, Yizhuo
    Shi, Feng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3732 - 3745