Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication

被引:18
|
作者
Akbudak, Kadir [1 ]
Selvitopi, Oguz [1 ]
Aykanat, Cevdet [1 ]
机构
[1] Bilkent Univ, Comp Engn Dept, TR-06800 Ankara, Turkey
关键词
Sparse matrix-matrix multiplication; SpGEMM; hypergraph partitioning; graph partitioning; communication cost; bandwidth; latency;
D O I
10.1145/3155292
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We investigate outer-product-parallel, inner-product-parallel, and row-by-row-product-parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a hypergraph model and a bipartite graph model for distributing SpGEMM computations based on one-dimensional (1D) partitioning of input matrices. We also propose a communication hypergraph model for each formulation for distributing communication operations. The computational graph and hypergraph models adopted in the first phase aim at minimizing the total message volume and balancing the computational loads of processors, whereas the communication hypergraph models adopted in the second phase aim at minimizing the total message count and balancing the message volume loads of processors. That is, the computational partitioning models reduce the bandwidth cost and the communication hypergraph models reduce the latency cost. Our extensive parallel experiments on up to 2048 processors for a wide range of realistic SpGEMM instances show that although the outer-product-parallel formulation scales better, the row-by-row-product-parallel formulation is more viable due to its significantly lower partitioning overhead and competitive scalability. For computational partitioning models, our experimental findings indicate that the proposed bipartite graph models are attractive alternatives to their hypergraph counterparts because of their lower partitioning overhead. Finally, we show that by reducing the latency cost besides the bandwidth cost through using the communication hypergraph models, the parallel SpGEMM time can be further improved up to 32%.
引用
收藏
页数:34
相关论文
共 50 条
  • [31] Parallel Algorithm for Quasi-Band Matrix-Matrix Multiplication
    Vooturi, Dharma Teja
    Kothapalli, Kishore
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I, 2016, 9573 : 106 - 115
  • [32] An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
    Liu, Weifeng
    Vinter, Brian
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [33] Column-Segmented Sparse Matrix-Matrix Multiplication on Multicore CPUs
    An, Xiaojing
    Catalyurek, Umit, V
    [J]. 2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 202 - 211
  • [34] Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs
    Wei, Bingxin
    Wang, Yizhuo
    Chang, Fangli
    Gao, Jianhua
    Ji, Weixing
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2024, 38 (03): : 245 - 259
  • [35] Generalized Sparse Matrix-Matrix Multiplication for Vector Engines and Graph Applications
    Li, Jiayu
    Wang, Fugang
    Araki, Takuya
    Qiu, Judy
    [J]. PROCEEDINGS OF MCHPC'19: 2019 IEEE/ACM WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING (MCHPC), 2019, : 33 - 42
  • [36] A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
    Liu, Weifeng
    Vinter, Brian
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2015, 85 : 47 - 61
  • [37] POSTER: Parallel Algorithms for Masked Sparse Matrix-Matrix Products
    Milakovic, Srdan
    Selvitopi, Oguz
    Nisa, Israt
    Budimlic, Zoran
    Buluc, Aydin
    [J]. PPOPP'22: PROCEEDINGS OF THE 27TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2022, : 453 - 454
  • [38] Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures
    Deveci, Mehmet
    Trott, Christian
    Rajamanickam, Sivasankaran
    [J]. PARALLEL COMPUTING, 2018, 78 : 33 - 46
  • [39] Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs
    Liu, Junhong
    He, Xin
    Liu, Weifeng
    Tan, Guangming
    [J]. ACM SIGPLAN NOTICES, 2018, 53 (01) : 407 - 408
  • [40] Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures
    Akbudak, Kadir
    Aykanat, Cevdet
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2258 - 2271