Exploring the Design Space of Distributed Parallel Sparse Matrix-Multiple Vector Multiplication

被引:0
|
作者
Huang, Hua [1 ]
Chow, Edmond [1 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci, Engn, Atlanta, GA 30332 USA
关键词
Sparse matrices; Partitioning algorithms; Vectors; Costs; Three-dimensional displays; Space exploration; Optimization; SpMM; SpMV; distributed-memory matrix multiplication; communication optimization; OPTIMIZATION; PERFORMANCE; FRAMEWORK;
D O I
10.1109/TPDS.2024.3452478
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the distributed memory parallel multiplication of a sparse matrix by a dense matrix (SpMM). The dense matrix is often a collection of dense vectors. Standard implementations will multiply the sparse matrix by multiple dense vectors at the same time, to exploit the computational efficiencies therein. But such approaches generally utilize the same sparse matrix partitioning as if multiplying by a single vector. This article explores the design space of parallelizing SpMM and shows that a coarser grain partitioning of the matrix combined with a column-wise partitioning of the block of vectors can often require less communication volume and achieve higher SpMM performance. An algorithm is presented that chooses a process grid geometry for a given number of processes to optimize the performance of parallel SpMM. The algorithm can augment existing graph partitioners by utilizing the additional concurrency available when multiplying by multiple dense vectors to further reduce communication.
引用
收藏
页码:1977 / 1988
页数:12
相关论文
共 50 条
  • [1] Parallel Computation of Sparse Matrix Vector Multiplication
    Yin, Wei
    He, Yu
    [J]. 2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 3, 2011, : 196 - 199
  • [2] An Efficient Sparse Matrix-Vector Multiplication on Distributed Memory Parallel Computers
    Shahnaz, Rukhsana
    Usman, Anila
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (01): : 77 - 82
  • [3] Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations
    Aktulga, Hasan Metin
    Buluc, Aydin
    Williams, Samuel
    Yang, Chao
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [4] Blocked-Based Sparse Matrix-Vector Multiplication on Distributed Memory Parallel Computers
    Shahnaz, Rukhsana
    Usman, Anila
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2011, 8 (02) : 130 - 136
  • [5] Adaptive Runtime Tuning of Parallel Sparse Matrix-Vector Multiplication on Distributed Memory Systems
    Lee, Seyong
    Eigenmann, Rudolf
    [J]. ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2008, : 195 - 204
  • [6] Parallel Sparse Matrix-Vector Multiplication Using Accelerators
    Maeda, Hiroshi
    Takahashi, Daisuke
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2016, PT II, 2016, 9787 : 3 - 18
  • [7] Communication balancing in parallel sparse matrix-vector multiplication
    Bisseling, RH
    Meesen, W
    [J]. ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2005, 21 : 47 - 65
  • [8] Sparse matrix-vector multiplication design on FPGAs
    Sun, Junqing
    Peterson, Gregory
    Storaasli, Olaf
    [J]. FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2007, : 349 - +
  • [9] Merge-based Parallel Sparse Matrix-Sparse Vector Multiplication with a Vector Architecture
    Li, Haoran
    Yokoyama, Harumichi
    Araki, Takuya
    [J]. IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 43 - 50
  • [10] Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks
    Buluc, Aydin
    Fineman, Jeremy T.
    Frigo, Matteo
    Gilbert, John R.
    Leiserson, Charles E.
    [J]. SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 233 - 244