A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS

被引:149
|
作者
Kreutzer, Moritz [1 ]
Hager, Georg [1 ]
Wellein, Gerhard [1 ]
Fehske, Holger [2 ]
Bishop, Alan R. [3 ]
机构
[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, D-91058 Erlangen, Germany
[2] Ernst Moritz Arndt Univ Greifswald, Inst Phys, D-17489 Greifswald, Germany
[3] Los Alamos Natl Lab, Theory Simulat & Computat Directorate, Los Alamos, NM 87545 USA
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2014年 / 36卷 / 05期
关键词
sparse matrix; sparse matrix-vector multiplication; data format; performance model; SIMD; PERFORMANCE;
D O I
10.1137/130930352
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi-and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELLC-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
引用
收藏
页码:C401 / C423
页数:23
相关论文
共 50 条
  • [1] VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors
    Li, Yishui
    Xie, Peizhen
    Chen, Xinhai
    Liu, Jie
    Yang, Bo
    Li, Shengguo
    Gong, Chunye
    Gan, Xinbiao
    Xu, Han
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (03): : 2063 - 2081
  • [2] An efficient SIMD compression format for sparse matrix-vector multiplication
    Chen, Xinhai
    Xie, Peizhen
    Chi, Lihua
    Liu, Jie
    Gong, Chunye
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [3] VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors
    Yishui Li
    Peizhen Xie
    Xinhai Chen
    Jie Liu
    Bo Yang
    Shengguo Li
    Chunye Gong
    Xinbiao Gan
    Han Xu
    The Journal of Supercomputing, 2020, 76 : 2063 - 2081
  • [4] Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
    Zhang, Kai
    Chen, Shuming
    Wang, Yaohua
    Wan, Jianghua
    IEICE ELECTRONICS EXPRESS, 2013, 10 (09):
  • [5] Adaptive sparse matrix representation for efficient matrix-vector multiplication
    Zardoshti, Pantea
    Khunjush, Farshad
    Sarbazi-Azad, Hamid
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (09): : 3366 - 3386
  • [6] Heterogeneous sparse matrix-vector multiplication via compressed sparse row format
    Lane, Phillip Allen
    Booth, Joshua Dennis
    PARALLEL COMPUTING, 2023, 115
  • [7] Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format
    Greathouse, Joseph L.
    Daga, Mayank
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 769 - 780
  • [8] An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication
    Karakasis, Vasileios
    Gkountouvas, Theodoros
    Kourtis, Kornilios
    Goumas, Georgios
    Koziris, Nectarios
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (10) : 1930 - 1940
  • [9] Structured sparse matrix-vector multiplication on massively parallel SIMD architectures
    Dehn, T
    Eiermann, M
    Giebermann, K
    Sperling, V
    PARALLEL COMPUTING, 1995, 21 (12) : 1867 - 1894
  • [10] SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
    Hishinuma, Toshiaki
    Hasegawa, Hidehiko
    Tanaka, Teruo
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 21 - 34