Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication

被引：0

作者：

Mi, Hongli ^{[1
]}

Yu, Xiangrui ^{[1
]}

Yu, Xiaosong ^{[1
]}

Wu, Shuangyuan ^{[1
]}

Liu, Weifeng ^{[1
]}

机构：

[1] China Univ Petr, Super Sci Software Lab, Beijing, Peoples R China

来源：

2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID | 2023年

基金：

中国国家自然科学基金;

关键词：

Distributed memory system; sparse matrixvector multiplication; load balancing; PARTITIONING MODELS; ALGORITHM; FORMAT;

D O I：

10.1109/CCGRID57682.2023.00056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sparse Matrix-Vector Multiplication (SpMV) is a fundamental operation in a number of scientific and engineering problems. When the sparse matrices processed are large enough, distributed memory systems should be used to accelerate SpMV. At present, the optimization techniques for distributed SpMV mainly focus on reordering through graph or hypergraph partitioning. However, although the reordering could reduce the amount of communications in general, there are still load balancing challenges in computations and communications on distributed platforms that are not well addressed. In this paper, we propose two strategies to optimize SpMV on distributed clusters: (1) resizing the number of row blocks on the nodes for balancing the amount of computations, and (2) adjusting the column number of the diagonal blocks for balancing tasks and reducing communications among compute nodes. The experimental results show that compared with the classic distributed SpMV implementation and its variant reordered with graph partitioning, our algorithm achieves on average 77.20x and 5.18x (up to 460.52x and 27.50x) speedups, respectively. Also, our method bring on average 19.56x (up to 48.49x) speedup over a recently proposed hybrid distributed SpMV algorithm. In addition, our algorithm achieves obviously better scalability over these existing distributed SpMV methods.

引用

页码：535 / 544

页数：10

共 50 条

[1] Communication balancing in parallel sparse matrix-vector multiplication
Bisseling, RH
Meesen, W
[J]. ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
[2] Load-balancing in sparse matrix-vector multiplication
Nastea, SG
Frieder, O
ElGhazawi, T
[J]. EIGHTH IEEE SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1996, : 218 - 225
[3] Sparse Matrix-Vector Multiplication on GPGPUs
Filippone, Salvatore
Cardellini, Valeria
Barbieri, Davide
Fanfarillo, Alessandro
[J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
[4] An Efficient Sparse Matrix-Vector Multiplication on Distributed Memory Parallel Computers
Shahnaz, Rukhsana
Usman, Anila
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (01): : 77 - 82
[5] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Tao, Yuan
Deng, Yangdong
Mu, Shuai
Zhang, Zhenzhong
Zhu, Mingfa
Xiao, Limin
Ruan, Li
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
[6] Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication
Utrera, Gladys
Gil, Marisa
Martorell, Xavier
[J]. 23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 321 - 328
[7] Vector ISA extension for sparse matrix-vector multiplication
Vassiliadis, S
Cotofana, S
Stathis, P
[J]. EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 708 - 715
[8] Node aware sparse matrix-vector multiplication
Bienz, Amanda
Gropp, William D.
Olson, Luke N.
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
[9] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
DuBois, David
DuBois, Andrew
Connor, Carolyn
Poole, Steve
[J]. PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 239 - +
[10] Understanding the performance of sparse matrix-vector multiplication
Goumas, Georgios
Kourtis, Kornilios
Anastopoulos, Nikos
Karakasis, Vasileios
Koziris, Nectarios
[J]. PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 283 - +

← 1 2 3 4 5 →