High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

被引:34
|
作者
Cevahir, Ali [1 ]
Nukada, Akira [3 ]
Matsuoka, Satoshi [2 ,4 ]
机构
[1] Tokyo Inst Technol, Dept Math & Comp Sci, Meguro Ku, Ookayama 2-12-1, Tokyo 1528552, Japan
[2] Tokyo Inst Technol, Natl Inst Informat, JST CREST, Chiyoda Ku, Tokyo 1018430, Japan
[3] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
[4] Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan
来源
基金
日本科学技术振兴机构;
关键词
GPU computing; GPU cluster; Conjugate Gradients; Hypergraph partitioning;
D O I
10.1007/s00450-010-0112-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most timeconsuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPUextended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPUextended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.
引用
收藏
页码:83 / 91
页数:9
相关论文
共 50 条
  • [41] Two-stage Asynchronous Iterative Solvers for multi-GPU Clusters
    Nayak, Pratik
    Cojean, Terry
    Anzt, Hartwig
    PROCEEDINGS OF SCALA 2020: 11TH WORKSHOP ON LATEST ADVANCES IN SCALABLE ALGORITHMS FOR LARGE-SCALE SYSTEMS, 2020, : 9 - 18
  • [42] High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications
    Abdelfattah, Ahmad
    Ltaief, Hatem
    Keyes, David
    EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 601 - 612
  • [43] MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization
    Sun, Yifan
    Baruah, Trinayan
    Mojumder, Saiful A.
    Dong, Shi
    Gong, Xiang
    Treadway, Shane
    Bao, Yuhui
    Hance, Spencer
    McCardwell, Carter
    Zhao, Vincent
    Barclay, Harrison
    Ziabari, Amir Kavyan
    Chen, Zhongliang
    Ubal, Rafael
    Abelian, Jose L.
    Kim, John
    Joshi, Ajay
    Kaeli, David
    PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 197 - 209
  • [44] Multi-GPU performance optimization of a computational fluid dynamics code using OpenACC
    Xue, Weicheng
    Roy, Christoper J.
    Concurrency and Computation: Practice and Experience, 2021, 33 (05)
  • [45] Performance Optimization for SpMV on Multi-GPU Systems Using Threads and Multiple Streams
    Guo, Ping
    Zhang, Changjiang
    2016 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2016, : 67 - 72
  • [46] Towards High-Performance Code Generation for Multi-GPU Clusters Based on a Domain-Specific Language for Algorithmic Skeletons
    Fabian Wrede
    Herbert Kuchen
    International Journal of Parallel Programming, 2020, 48 : 713 - 728
  • [47] Solving incompressible two-phase flows on multi-GPU clusters
    Zaspel, Peter
    Griebel, Michael
    COMPUTERS & FLUIDS, 2013, 80 : 356 - 364
  • [48] New multi-GPU implementation for smoothed particle hydrodynamics on heterogeneous clusters
    Dominguez, J. M.
    Crespo, A. J. C.
    Valdez-Balderas, D.
    Rogers, B. D.
    Gomez-Gesteira, M.
    COMPUTER PHYSICS COMMUNICATIONS, 2013, 184 (08) : 1848 - 1860
  • [49] Solving incompressible two-phase flows on multi-GPU clusters
    Zaspel, P. (zaspel@ins.uni-bonn.de), 1600, Elsevier Ltd (80):
  • [50] Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes
    Jo, Gangwon
    Nah, Jeongho
    Lee, Jun
    Kim, Jungwon
    Lee, Jaejin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (07) : 1814 - 1825