Exploring the Design Space of Distributed Parallel Sparse Matrix-Multiple Vector Multiplication

被引:0
|
作者
Huang, Hua [1 ]
Chow, Edmond [1 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci, Engn, Atlanta, GA 30332 USA
关键词
Sparse matrices; Partitioning algorithms; Vectors; Costs; Three-dimensional displays; Space exploration; Optimization; SpMM; SpMV; distributed-memory matrix multiplication; communication optimization; OPTIMIZATION; PERFORMANCE; FRAMEWORK;
D O I
10.1109/TPDS.2024.3452478
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the distributed memory parallel multiplication of a sparse matrix by a dense matrix (SpMM). The dense matrix is often a collection of dense vectors. Standard implementations will multiply the sparse matrix by multiple dense vectors at the same time, to exploit the computational efficiencies therein. But such approaches generally utilize the same sparse matrix partitioning as if multiplying by a single vector. This article explores the design space of parallelizing SpMM and shows that a coarser grain partitioning of the matrix combined with a column-wise partitioning of the block of vectors can often require less communication volume and achieve higher SpMM performance. An algorithm is presented that chooses a process grid geometry for a given number of processes to optimize the performance of parallel SpMM. The algorithm can augment existing graph partitioners by utilizing the additional concurrency available when multiplying by multiple dense vectors to further reduce communication.
引用
收藏
页码:1977 / 1988
页数:12
相关论文
共 50 条
  • [41] Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures
    Eberhardt, Ryan
    Hoemmen, Mark
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 663 - 672
  • [42] A two-dimensional data distribution method for parallel sparse matrix-vector multiplication
    Vastenhouw, B
    Bisseling, RH
    SIAM REVIEW, 2005, 47 (01) : 67 - 95
  • [43] A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication
    Gao, Jiaquan
    Zhou, Yuanshen
    Wu, Kesong
    PARALLEL PROCESSING LETTERS, 2016, 26 (04)
  • [44] A Novel Parallel Scan for Multicore Processors and Its Application in Sparse Matrix-Vector Multiplication
    Zhang, Nan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (03) : 397 - 404
  • [45] Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512
    Zhang, Hong
    Mills, Richard T.
    Rupp, Karl
    Smith, Barry F.
    PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [46] Exploring Better Speculation and Data Locality in Sparse Matrix-Vector Multiplication on Intel Xeon
    Zhao, Haoran
    Xia, Tian
    Li, Chenyang
    Zhao, Wenzhe
    Zheng, Nanning
    Ren, Pengju
    2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 601 - 609
  • [47] DATA DISTRIBUTIONS FOR SPARSE-MATRIX VECTOR MULTIPLICATION
    ROMERO, LF
    ZAPATA, EL
    PARALLEL COMPUTING, 1995, 21 (04) : 583 - 605
  • [48] SPARSE-MATRIX VECTOR MULTIPLICATION ON DISTRIBUTED ARCHITECTURES - LOWER BOUNDS AND AVERAGE COMPLEXITY RESULTS
    MANZINI, G
    INFORMATION PROCESSING LETTERS, 1994, 50 (05) : 231 - 238
  • [49] Understanding the performance of sparse matrix-vector multiplication
    Goumas, Georgios
    Kourtis, Kornilios
    Anastopoulos, Nikos
    Karakasis, Vasileios
    Koziris, Nectarios
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 283 - +
  • [50] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
    DuBois, David
    DuBois, Andrew
    Connor, Carolyn
    Poole, Steve
    PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 239 - +