Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model

被引:12
|
作者
Rubensson, Emanuel H. [1 ]
Rudberg, Elias [1 ]
机构
[1] Uppsala Univ, Div Comp Sci, Dept Informat Technol, Box 337, SE-75105 Uppsala, Sweden
基金
瑞典研究理事会;
关键词
Parallel computing; Sparse matrix-matrix multiplication; Scalable algorithms; Large-scale computing; Graphics processing units; DENSITY-MATRIX; IMPLEMENTATION; PERFORMANCE; DESIGN; SYSTEM; COSTS;
D O I
10.1016/j.parco.2016.06.005
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a method for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. By using a quadtree matrix representation, data locality is exploited without prior information about the matrix sparsity pattern. A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. The quadtree representation combined with the Chunks and Tasks model leads to favorable weak and strong scaling of the communication cost with the number of processes, as shown both theoretically and in numerical experiments. Matrices are represented by sparse quadtrees of chunk objects. The leaves in the hierarchy are block-sparse submatrices. Sparsity is dynamically detected by the matrix library and may occur at any level in the hierarchy and/or within the submatrix leaves. In case graphics processing units (GPUs) are available, both CPUs and GPUs are used for leaf-level multiplication work, thus making use of the full computing capacity of each node. The performance is evaluated for matrices with different sparsity structures, including examples from electronic structure calculations. Compared to methods that do not exploit data locality, our locality-aware approach reduces communication significantly, achieving essentially constant communication per node in weak scaling tests. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:87 / 106
页数:20
相关论文
共 39 条
  • [1] Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication
    Liu, Junhong
    He, Xin
    Liu, Weifeng
    Tan, Guangming
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 403 - 417
  • [2] Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms
    Patwary, Md. Mostofa Ali
    Satish, Nadathur Rajagopalan
    Sundaram, Narayanan
    Park, Jongsoo
    Anderson, Michael J.
    Vadlamudi, Satya Gautam
    Das, Dipankar
    Pudov, Sergey G.
    Pirogov, Vadim O.
    Dubey, Pradeep
    [J]. HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 : 48 - 57
  • [3] Locality-Aware Parallel Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication on Many-Core Processors
    Karsavuran, M. Ozan
    Akbudak, Kadir
    Aykanat, Cevdet
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (06) : 1713 - 1726
  • [4] PARALLEL SPARSE MATRIX-MATRIX MULTIPLICATION AND INDEXING: IMPLEMENTATION AND EXPERIMENTS
    Buluc, Aydin
    Gilbert, John R.
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04): : C170 - C191
  • [5] Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication
    Akbudak, Kadir
    Selvitopi, Oguz
    Aykanat, Cevdet
    [J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 4 (03)
  • [6] Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer
    Chen, Yuedan
    Li, Kenli
    Yang, Wangdong
    Xiao, Guoqing
    Xie, Xianghui
    Li, Tao
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (04) : 923 - 938
  • [7] Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures
    Akbudak, Kadir
    Aykanat, Cevdet
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2258 - 2271
  • [8] Brief Announcement: Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
    Ballard, Grey
    Druinsky, Alex
    Knight, Nicholas
    Schwartz, Oded
    [J]. SPAA'15: PROCEEDINGS OF THE 27TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2015, : 86 - 88
  • [9] Bandwidth Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking
    Gu, Zhixiang
    Moreira, Jose
    Edelsohn, David
    Azad, Ariful
    [J]. PROCEEDINGS OF THE 32ND ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES (SPAA '20), 2020, : 293 - 303
  • [10] A Data Locality-aware Design Framework for Reconfigurable Sparse Matrix-Vector Multiplication Kernel
    Li, Sicheng
    Wang, Yandan
    Wen, Wujie
    Wang, Yu
    Chen, Yiran
    Li, Hai
    [J]. 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2016,