A Multilevel Compressed Sparse Row Format for Efficient Sparse Computations on Multicore Processors

被引：0

作者：

Kabir, Humayun ^{[1
]}

Booth, Joshua Dennis ^{[1
]}

Raghavan, Padma ^{[1
]}

机构：

[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA

来源：

2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC) | 2014年

基金：

美国国家科学基金会;

关键词：

PERFORMANCE;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We seek to improve the performance of sparse matrix computations on multicore processors with non-uniform memory access (NUMA). Typical implementations use a bandwidth reducing ordering of the matrix to increase locality of accesses with a compressed storage format to store and operate only on the non-zero values. We propose a new multilevel storage format and a companion ordering scheme as an explicit adaptation to map to NUMA hierarchies. More specifically, we propose CSR-k, a multilevel form of the popular compressed sparse row (CSR) format for a multicore processor with k > 1 well-differentiated levels in the memory subsystem. Additionally, we develop Band-k, a modified form of a traditional bandwidth reduction scheme, to convert a matrix represented in CSR to our proposed CSR-k. We evaluate the performance of the widely-used and important sparse matrix-vector multiplication (SpMV) kernel using CSR-2 on Intel Westmere processors for a test suite of 12 large sparse matrices with row densities in the range 3 to 45. On 32 cores, on average across all matrices in the test suite, the execution time for SpMV with CSR-2 is less than 42% of the time taken by the state-of-the-art automatically tuned SpMV resulting in energy savings of approximately 56%. Additionally, on average, the parallel speed-up on 32 cores of the automatically tuned SpMV relative to its 1-core performance is 8.18 compared to a value of 19.71 for CSR-2. Our analysis indicates that the higher performance of SpMV with CSR-2 comes from achieving higher reuse of x in the shared L3 cache without incurring overheads from fill-in of original zeroes. Furthermore, the pre-processing costs of SpMV with CSR-2 can be amortized on average over 97 iterations of SpMV using CSR and are substantially lower than the 513 iterations required for the automatically tuned implementation. Based on these results, CSR-k appears to be a promising multilevel formulation of CSR for adapting sparse computations to multicore processors with NUMA memory hierarchies.

引用

页数：10

共 50 条

[1] An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU
Xing, Longyue
Wang, Zhaoshun
Ding, Zhezhao
Chu, Genshen
Dong, Lingyu
Xiao, Nan
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (23):
[2] Heterogeneous sparse matrix-vector multiplication via compressed sparse row format
Lane, Phillip Allen
Booth, Joshua Dennis
[J]. PARALLEL COMPUTING, 2023, 115
[3] Vectorized sparse matrix multiply for compressed row storage format
D'Azevedo, EF
Fahey, MR
Mills, RT
[J]. COMPUTATIONAL SCIENCE - ICCS 2005, PT 1, PROCEEDINGS, 2005, 3514 : 99 - 106
[4] Modified Compressed Sparse Row format for accelerated FPGA-based sparse matrix multiplication
Pligouroudis, Michail
Nuno, Rafael Angel Gutierrez
Kazmierski, Tom
[J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[5] Optimization of the ILU(0) factorization algorithm with the use of compressed sparse row format
Akhunov R.R.
Kuksenko S.P.
Salov V.K.
Gazizov T.R.
[J]. Journal of Mathematical Sciences, 2013, 191 (1) : 19 - 27
[6] Efficient Distributed Graph Analytics using Triply Compressed Sparse Format
Mofrad, Mohammad Hasanzadeh
Melhem, Rami
Ahmad, Yousuf
Hammoud, Mohammad
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 351 - 361
[7] An alternative compressed storage format for sparse matrices
Ekambaram, A
Montagne, E
[J]. COMPUTER AND INFORMATION SCIENCES - ISCIS 2003, 2003, 2869 : 196 - 203
[8] Sparse matrix multiplication: The distributed block-compressed sparse row library
Borstnik, Urban
VandeVondele, Joost
Weber, Valery
Hutter, Juerg
[J]. PARALLEL COMPUTING, 2014, 40 (5-6) : 47 - 58
[9] Sparse Matrix Computations Using the Quadtree Storage Format
Simecek, Ivan
[J]. 11TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2009), 2009, : 168 - 173
[10] Efficient and Scalable Computations with Sparse Tensors
Baskaran, Muthu
Meister, Benoit
Vasilache, Nicolas
Lethin, Richard
[J]. 2012 IEEE CONFERENCE ON HIGH PERFORMANCE EXTREME COMPUTING (HPEC), 2012,

← 1 2 3 4 5 →