A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS

被引:149
|
作者
Kreutzer, Moritz [1 ]
Hager, Georg [1 ]
Wellein, Gerhard [1 ]
Fehske, Holger [2 ]
Bishop, Alan R. [3 ]
机构
[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, D-91058 Erlangen, Germany
[2] Ernst Moritz Arndt Univ Greifswald, Inst Phys, D-17489 Greifswald, Germany
[3] Los Alamos Natl Lab, Theory Simulat & Computat Directorate, Los Alamos, NM 87545 USA
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2014年 / 36卷 / 05期
关键词
sparse matrix; sparse matrix-vector multiplication; data format; performance model; SIMD; PERFORMANCE;
D O I
10.1137/130930352
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi-and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELLC-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
引用
收藏
页码:C401 / C423
页数:23
相关论文
共 50 条
  • [21] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
    Georgios Goumas
    Kornilios Kourtis
    Nikos Anastopoulos
    Vasileios Karakasis
    Nectarios Koziris
    The Journal of Supercomputing, 2009, 50 : 36 - 77
  • [22] Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS format on GPUs
    Yoshizawa, Hiroki
    Takahashi, Daisuke
    15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012), 2012, : 130 - 136
  • [23] Performance evaluation of the sparse matrix-vector multiplication on modern architectures
    Goumas, Georgios
    Kourtis, Kornilios
    Anastopoulos, Nikos
    Karakasis, Vasileios
    Koziris, Nectarios
    JOURNAL OF SUPERCOMPUTING, 2009, 50 (01): : 36 - 77
  • [24] Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture
    Aananthakrishnan, Sriram
    Pawlowski, Robert
    Fryman, Joshua
    Hur, Ibrahim
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [25] HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors
    Li, Wenxuan
    Cheng, Helin
    Lu, Zhengyang
    Lu, Yuechen
    Liu, Weifeng
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 209 - 220
  • [26] Efficient FCM Computations Using Sparse Matrix-Vector Multiplication
    Puheim, Michal
    Vascak, Jan
    Machova, Kristina
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4165 - 4170
  • [27] Efficient Multicore Sparse Matrix-Vector Multiplication for FE Electromagnetics
    Fernandez, David M.
    Giannacopoulos, Dennis
    Gross, Warren J.
    IEEE TRANSACTIONS ON MAGNETICS, 2009, 45 (03) : 1392 - 1395
  • [28] Efficient sparse matrix-vector multiplication using cache oblivious extension quadtree storage format
    Zhang, Jilin
    Wan, Jian
    Li, Fangfang
    Mao, Jie
    Zhuang, Li
    Yuan, Junfeng
    Liu, Enyi
    Yu, Zhuoer
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 54 : 490 - 500
  • [29] Node aware sparse matrix-vector multiplication
    Bienz, Amanda
    Gropp, William D.
    Olson, Luke N.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
  • [30] Understanding the performance of sparse matrix-vector multiplication
    Goumas, Georgios
    Kourtis, Kornilios
    Anastopoulos, Nikos
    Karakasis, Vasileios
    Koziris, Nectarios
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 283 - +