A Multilevel Compressed Sparse Row Format for Efficient Sparse Computations on Multicore Processors

被引:0
|
作者
Kabir, Humayun [1 ]
Booth, Joshua Dennis [1 ]
Raghavan, Padma [1 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
PERFORMANCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We seek to improve the performance of sparse matrix computations on multicore processors with non-uniform memory access (NUMA). Typical implementations use a bandwidth reducing ordering of the matrix to increase locality of accesses with a compressed storage format to store and operate only on the non-zero values. We propose a new multilevel storage format and a companion ordering scheme as an explicit adaptation to map to NUMA hierarchies. More specifically, we propose CSR-k, a multilevel form of the popular compressed sparse row (CSR) format for a multicore processor with k > 1 well-differentiated levels in the memory subsystem. Additionally, we develop Band-k, a modified form of a traditional bandwidth reduction scheme, to convert a matrix represented in CSR to our proposed CSR-k. We evaluate the performance of the widely-used and important sparse matrix-vector multiplication (SpMV) kernel using CSR-2 on Intel Westmere processors for a test suite of 12 large sparse matrices with row densities in the range 3 to 45. On 32 cores, on average across all matrices in the test suite, the execution time for SpMV with CSR-2 is less than 42% of the time taken by the state-of-the-art automatically tuned SpMV resulting in energy savings of approximately 56%. Additionally, on average, the parallel speed-up on 32 cores of the automatically tuned SpMV relative to its 1-core performance is 8.18 compared to a value of 19.71 for CSR-2. Our analysis indicates that the higher performance of SpMV with CSR-2 comes from achieving higher reuse of x in the shared L3 cache without incurring overheads from fill-in of original zeroes. Furthermore, the pre-processing costs of SpMV with CSR-2 can be amortized on average over 97 iterations of SpMV using CSR and are substantially lower than the 513 iterations required for the automatically tuned implementation. Based on these results, CSR-k appears to be a promising multilevel formulation of CSR for adapting sparse computations to multicore processors with NUMA memory hierarchies.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] Efficient algorithm for sparse matrix computations
    Park, S.C.
    Draayer, J.P.
    Zheng, S.-Q.
    Applied Computing: Technological Challenges of the 1990's, 1992, : 919 - 926
  • [12] Rateless Codes for Distributed Computations with Sparse Compressed Matrices
    Mallick, Ankur
    Joshi, Gauri
    2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 2793 - 2797
  • [13] Packed Compressed Sparse Row: A Dynamic Graph Representation
    Wheatman, Brian
    Xu, Helen
    2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [14] A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS
    Kreutzer, Moritz
    Hager, Georg
    Wellein, Gerhard
    Fehske, Holger
    Bishop, Alan R.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05): : C401 - C423
  • [15] Machine Learning to Design an Auto-tuning System for the Best Compressed Format Detection for Parallel Sparse Computations
    Hamdi-Larbi, Olfa
    Mehrez, Ichrak
    Dufaud, Thomas
    PARALLEL PROCESSING LETTERS, 2021, 31 (04)
  • [16] Towards Efficient Algorithms for Compressed Sparse-Sparse Matrix Product
    Ezouaoui, Sana
    Hamdi-Larbi, Olfa
    Mahjoub, Zaher
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 651 - 658
  • [17] Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems
    Aliaga, Jose I.
    Barreda, Maria
    Dufrechou, Ernesto
    Ezzatti, Pablo
    Quintana-Orti, Enrique S.
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2016, 31 (04): : 175 - 183
  • [18] AN EFFICIENT STORAGE FORMAT FOR LARGE SPARSE MATRICES
    Farzaneh, Aiyoub
    Kheiri, Hossein
    Shahmersi, Mehdi Abbaspour
    COMMUNICATIONS FACULTY OF SCIENCES UNIVERSITY OF ANKARA-SERIES A1 MATHEMATICS AND STATISTICS, 2009, 58 (02): : 1 - 10
  • [19] Efficient MATLAB computations with sparse and factored tensors
    Bader, Brett W.
    Kolda, Tamara G.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2007, 30 (01): : 205 - 231
  • [20] Effect of the storage format of sparse linear systems on parallel CFD computations
    Dutto, LC
    Lepage, CY
    Habashi, WG
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2000, 188 (1-3) : 441 - 453