Exploring the Design Space of Distributed Parallel Sparse Matrix-Multiple Vector Multiplication

被引：0

作者：

Huang, Hua ^{[1
]}

Chow, Edmond ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Computat Sci, Engn, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2024年 / 35卷 / 11期

关键词：

Sparse matrices; Partitioning algorithms; Vectors; Costs; Three-dimensional displays; Space exploration; Optimization; SpMM; SpMV; distributed-memory matrix multiplication; communication optimization; OPTIMIZATION; PERFORMANCE; FRAMEWORK;

D O I：

10.1109/TPDS.2024.3452478

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We consider the distributed memory parallel multiplication of a sparse matrix by a dense matrix (SpMM). The dense matrix is often a collection of dense vectors. Standard implementations will multiply the sparse matrix by multiple dense vectors at the same time, to exploit the computational efficiencies therein. But such approaches generally utilize the same sparse matrix partitioning as if multiplying by a single vector. This article explores the design space of parallelizing SpMM and shows that a coarser grain partitioning of the matrix combined with a column-wise partitioning of the block of vectors can often require less communication volume and achieve higher SpMM performance. An algorithm is presented that chooses a process grid geometry for a given number of processes to optimize the performance of parallel SpMM. The algorithm can augment existing graph partitioners by utilizing the additional concurrency available when multiplying by multiple dense vectors to further reduce communication.

引用

下载

页码：1977 / 1988

页数：12

共 50 条

[11] A work-efficient parallel sparse matrix-sparse vector multiplication algorithm
Azad, Ariful
Buluc, Aydin
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 688 - 697
[12] Towards a fast parallel sparse symmetric matrix-vector multiplication
Geus, R
Röllin, S
PARALLEL COMPUTING, 2001, 27 (07) : 883 - 896
[13] Merge-based Parallel Sparse Matrix-Vector Multiplication
Merrill, Duane
Garland, Michael
SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 678 - 689
[14] Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication
Mi, Hongli
Yu, Xiangrui
Yu, Xiaosong
Wu, Shuangyuan
Liu, Weifeng
2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID, 2023, : 535 - 544
[15] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Tao, Yuan
Deng, Yangdong
Mu, Shuai
Zhang, Zhenzhong
Zhu, Mingfa
Xiao, Limin
Ruan, Li
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
[16] SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
Hishinuma, Toshiaki
Hasegawa, Hidehiko
Tanaka, Teruo
HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 21 - 34
[17] Design space exploration for sparse matrix-matrix multiplication on FPGAs
Lin, Colin Yu
Wong, Ngai
So, Hayden Kwok-Hay
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2013, 41 (02) : 205 - 219
[18] Parallel device for sparse matrix multiplication
Vyzhikovski, R.
Kanevskii, Yu.S.
Maslennikov, O.V.
Engineering Simulation, 1993, 11 (03): : 412 - 422
[19] Multiple-precision sparse matrix-vector multiplication on GPUs
Isupov, Konstantin
JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
[20] Structured sparse matrix-vector multiplication on massively parallel SIMD architectures
Dehn, T
Eiermann, M
Giebermann, K
Sperling, V
PARALLEL COMPUTING, 1995, 21 (12) : 1867 - 1894

← 1 2 3 4 5 →