A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS

被引:149
|
作者
Kreutzer, Moritz [1 ]
Hager, Georg [1 ]
Wellein, Gerhard [1 ]
Fehske, Holger [2 ]
Bishop, Alan R. [3 ]
机构
[1] Univ Erlangen Nurnberg, Erlangen Reg Comp Ctr, D-91058 Erlangen, Germany
[2] Ernst Moritz Arndt Univ Greifswald, Inst Phys, D-17489 Greifswald, Germany
[3] Los Alamos Natl Lab, Theory Simulat & Computat Directorate, Los Alamos, NM 87545 USA
来源
SIAM JOURNAL ON SCIENTIFIC COMPUTING | 2014年 / 36卷 / 05期
关键词
sparse matrix; sparse matrix-vector multiplication; data format; performance model; SIMD; PERFORMANCE;
D O I
10.1137/130930352
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi-and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELLC-sigma, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-sigma spMVM kernel. SELL-C-sigma comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent ("catch-all") sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
引用
收藏
页码:C401 / C423
页数:23
相关论文
共 50 条
  • [41] An Efficient Sparse Matrix-Vector Multiplication on Distributed Memory Parallel Computers
    Shahnaz, Rukhsana
    Usman, Anila
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (01): : 77 - 82
  • [42] Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU
    Gao, Jiaquan
    Qi, Panpan
    He, Guixia
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [43] CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
    Liu, Weifeng
    Vinter, Brian
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 339 - 350
  • [44] AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs
    Maggioni, Marco
    Berger-Wolf, Tanya
    2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 11 - 20
  • [45] Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks
    Buluc, Aydin
    Fineman, Jeremy T.
    Frigo, Matteo
    Gilbert, John R.
    Leiserson, Charles E.
    SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 233 - 244
  • [46] Sparse Matrix-Vector Multiplication: A Data Mapping-Based Architecture
    Mansour, Ahmad
    Goetze, Juergen
    Hsu, Wei-Chun
    Ruan, Shanq-Jang
    2014 15TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2014), 2014, : 152 - 158
  • [47] An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units
    Abu-Sufah, Walid
    Karim, Asma Abdel
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 453 - 460
  • [48] Communication balancing in parallel sparse matrix-vector multiplication
    Bisseling, RH
    Meesen, W
    ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
  • [49] Sparse matrix-vector multiplication on network-on-chip
    Sun, C-C
    Goetze, J.
    Jheng, H-Y
    Ruan, S-J
    ADVANCES IN RADIO SCIENCE, 2010, 8 : 289 - 294
  • [50] Implementing Sparse Matrix-Vector Multiplication with QCSR on GPU
    Zhang, Jilin
    Liu, Enyi
    Wan, Jian
    Ren, Yongjian
    Yue, Miao
    Wang, Jue
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 473 - 482