A Multilevel Compressed Sparse Row Format for Efficient Sparse Computations on Multicore Processors

被引:0
|
作者
Kabir, Humayun [1 ]
Booth, Joshua Dennis [1 ]
Raghavan, Padma [1 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
PERFORMANCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We seek to improve the performance of sparse matrix computations on multicore processors with non-uniform memory access (NUMA). Typical implementations use a bandwidth reducing ordering of the matrix to increase locality of accesses with a compressed storage format to store and operate only on the non-zero values. We propose a new multilevel storage format and a companion ordering scheme as an explicit adaptation to map to NUMA hierarchies. More specifically, we propose CSR-k, a multilevel form of the popular compressed sparse row (CSR) format for a multicore processor with k > 1 well-differentiated levels in the memory subsystem. Additionally, we develop Band-k, a modified form of a traditional bandwidth reduction scheme, to convert a matrix represented in CSR to our proposed CSR-k. We evaluate the performance of the widely-used and important sparse matrix-vector multiplication (SpMV) kernel using CSR-2 on Intel Westmere processors for a test suite of 12 large sparse matrices with row densities in the range 3 to 45. On 32 cores, on average across all matrices in the test suite, the execution time for SpMV with CSR-2 is less than 42% of the time taken by the state-of-the-art automatically tuned SpMV resulting in energy savings of approximately 56%. Additionally, on average, the parallel speed-up on 32 cores of the automatically tuned SpMV relative to its 1-core performance is 8.18 compared to a value of 19.71 for CSR-2. Our analysis indicates that the higher performance of SpMV with CSR-2 comes from achieving higher reuse of x in the shared L3 cache without incurring overheads from fill-in of original zeroes. Furthermore, the pre-processing costs of SpMV with CSR-2 can be amortized on average over 97 iterations of SpMV using CSR and are substantially lower than the 513 iterations required for the automatically tuned implementation. Based on these results, CSR-k appears to be a promising multilevel formulation of CSR for adapting sparse computations to multicore processors with NUMA memory hierarchies.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU
    Xing, Longyue
    Wang, Zhaoshun
    Ding, Zhezhao
    Chu, Genshen
    Dong, Lingyu
    Xiao, Nan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (23):
  • [2] Heterogeneous sparse matrix-vector multiplication via compressed sparse row format
    Lane, Phillip Allen
    Booth, Joshua Dennis
    [J]. PARALLEL COMPUTING, 2023, 115
  • [3] Vectorized sparse matrix multiply for compressed row storage format
    D'Azevedo, EF
    Fahey, MR
    Mills, RT
    [J]. COMPUTATIONAL SCIENCE - ICCS 2005, PT 1, PROCEEDINGS, 2005, 3514 : 99 - 106
  • [4] Modified Compressed Sparse Row format for accelerated FPGA-based sparse matrix multiplication
    Pligouroudis, Michail
    Nuno, Rafael Angel Gutierrez
    Kazmierski, Tom
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [5] Optimization of the ILU(0) factorization algorithm with the use of compressed sparse row format
    Akhunov R.R.
    Kuksenko S.P.
    Salov V.K.
    Gazizov T.R.
    [J]. Journal of Mathematical Sciences, 2013, 191 (1) : 19 - 27
  • [6] Efficient Distributed Graph Analytics using Triply Compressed Sparse Format
    Mofrad, Mohammad Hasanzadeh
    Melhem, Rami
    Ahmad, Yousuf
    Hammoud, Mohammad
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 351 - 361
  • [7] An alternative compressed storage format for sparse matrices
    Ekambaram, A
    Montagne, E
    [J]. COMPUTER AND INFORMATION SCIENCES - ISCIS 2003, 2003, 2869 : 196 - 203
  • [8] Sparse matrix multiplication: The distributed block-compressed sparse row library
    Borstnik, Urban
    VandeVondele, Joost
    Weber, Valery
    Hutter, Juerg
    [J]. PARALLEL COMPUTING, 2014, 40 (5-6) : 47 - 58
  • [9] Sparse Matrix Computations Using the Quadtree Storage Format
    Simecek, Ivan
    [J]. 11TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2009), 2009, : 168 - 173
  • [10] Efficient and Scalable Computations with Sparse Tensors
    Baskaran, Muthu
    Meister, Benoit
    Vasilache, Nicolas
    Lethin, Richard
    [J]. 2012 IEEE CONFERENCE ON HIGH PERFORMANCE EXTREME COMPUTING (HPEC), 2012,