A Multilevel Compressed Sparse Row Format for Efficient Sparse Computations on Multicore Processors

被引：0

作者：

Kabir, Humayun ^{[1
]}

Booth, Joshua Dennis ^{[1
]}

Raghavan, Padma ^{[1
]}

机构：

[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA

来源：

2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2014年

基金：

美国国家科学基金会;

关键词：

PERFORMANCE;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We seek to improve the performance of sparse matrix computations on multicore processors with non-uniform memory access (NUMA). Typical implementations use a bandwidth reducing ordering of the matrix to increase locality of accesses with a compressed storage format to store and operate only on the non-zero values. We propose a new multilevel storage format and a companion ordering scheme as an explicit adaptation to map to NUMA hierarchies. More specifically, we propose CSR-k, a multilevel form of the popular compressed sparse row (CSR) format for a multicore processor with k > 1 well-differentiated levels in the memory subsystem. Additionally, we develop Band-k, a modified form of a traditional bandwidth reduction scheme, to convert a matrix represented in CSR to our proposed CSR-k. We evaluate the performance of the widely-used and important sparse matrix-vector multiplication (SpMV) kernel using CSR-2 on Intel Westmere processors for a test suite of 12 large sparse matrices with row densities in the range 3 to 45. On 32 cores, on average across all matrices in the test suite, the execution time for SpMV with CSR-2 is less than 42% of the time taken by the state-of-the-art automatically tuned SpMV resulting in energy savings of approximately 56%. Additionally, on average, the parallel speed-up on 32 cores of the automatically tuned SpMV relative to its 1-core performance is 8.18 compared to a value of 19.71 for CSR-2. Our analysis indicates that the higher performance of SpMV with CSR-2 comes from achieving higher reuse of x in the shared L3 cache without incurring overheads from fill-in of original zeroes. Furthermore, the pre-processing costs of SpMV with CSR-2 can be amortized on average over 97 iterations of SpMV using CSR and are substantially lower than the 513 iterations required for the automatically tuned implementation. Based on these results, CSR-k appears to be a promising multilevel formulation of CSR for adapting sparse computations to multicore processors with NUMA memory hierarchies.

引用

页数：10

共 50 条

[11] Efficient algorithm for sparse matrix computations
Park, S.C.
Draayer, J.P.
Zheng, S.-Q.
Applied Computing: Technological Challenges of the 1990's, 1992, : 919 - 926
[12] Rateless Codes for Distributed Computations with Sparse Compressed Matrices
Mallick, Ankur
Joshi, Gauri
2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 2793 - 2797
[13] Packed Compressed Sparse Row: A Dynamic Graph Representation
Wheatman, Brian
Xu, Helen
2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
[14] A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS
Kreutzer, Moritz
Hager, Georg
Wellein, Gerhard
Fehske, Holger
Bishop, Alan R.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (05): : C401 - C423
[15] Machine Learning to Design an Auto-tuning System for the Best Compressed Format Detection for Parallel Sparse Computations
Hamdi-Larbi, Olfa
Mehrez, Ichrak
Dufaud, Thomas
PARALLEL PROCESSING LETTERS, 2021, 31 (04)
[16] Towards Efficient Algorithms for Compressed Sparse-Sparse Matrix Product
Ezouaoui, Sana
Hamdi-Larbi, Olfa
Mahjoub, Zaher
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 651 - 658
[17] Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems
Aliaga, Jose I.
Barreda, Maria
Dufrechou, Ernesto
Ezzatti, Pablo
Quintana-Orti, Enrique S.
COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2016, 31 (04): : 175 - 183
[18] AN EFFICIENT STORAGE FORMAT FOR LARGE SPARSE MATRICES
Farzaneh, Aiyoub
Kheiri, Hossein
Shahmersi, Mehdi Abbaspour
COMMUNICATIONS FACULTY OF SCIENCES UNIVERSITY OF ANKARA-SERIES A1 MATHEMATICS AND STATISTICS, 2009, 58 (02): : 1 - 10
[19] Efficient MATLAB computations with sparse and factored tensors
Bader, Brett W.
Kolda, Tamara G.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2007, 30 (01): : 205 - 231
[20] Effect of the storage format of sparse linear systems on parallel CFD computations
Dutto, LC
Lepage, CY
Habashi, WG
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2000, 188 (1-3) : 441 - 453

← 1 2 3 4 5 →