High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

被引:34
|
作者
Cevahir, Ali [1 ]
Nukada, Akira [3 ]
Matsuoka, Satoshi [2 ,4 ]
机构
[1] Tokyo Inst Technol, Dept Math & Comp Sci, Meguro Ku, Ookayama 2-12-1, Tokyo 1528552, Japan
[2] Tokyo Inst Technol, Natl Inst Informat, JST CREST, Chiyoda Ku, Tokyo 1018430, Japan
[3] Tokyo Inst Technol, Meguro Ku, Tokyo 1528552, Japan
[4] Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan
来源
基金
日本科学技术振兴机构;
关键词
GPU computing; GPU cluster; Conjugate Gradients; Hypergraph partitioning;
D O I
10.1007/s00450-010-0112-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most timeconsuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPUextended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPUextended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.
引用
收藏
页码:83 / 91
页数:9
相关论文
共 50 条
  • [1] A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform
    Ament, M.
    Knittel, G.
    Weiskopf, D.
    Strasser, W.
    PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 583 - 592
  • [2] A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm
    Gao, Jiaquan
    Zhou, Yuanshen
    He, Guixia
    Xia, Yifei
    PARALLEL COMPUTING, 2017, 63 : 1 - 16
  • [3] Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters
    Al Badawi, Ahmad
    Veeravalli, Bharadwaj
    Lin, Jie
    Xiao, Nan
    Kazuaki, Matsumura
    Khin Mi Mi, Aung
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (02) : 379 - 391
  • [4] Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models
    Zhong, Ziming
    Rychkov, Vladimir
    Lastovetsky, Alexey
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (09) : 2506 - 2518
  • [5] MULTI-GPU DGEMM AND HIGH PERFORMANCE LINPACK ON HIGHLY ENERGY-EFFICIENT CLUSTERS
    Rohr, David
    Bach, Matthias
    Kretz, Matthias
    Lindenstruth, Volker
    IEEE MICRO, 2011, 31 (05) : 18 - 26
  • [6] High-performance multi-GPU solver for describing nonlinear acoustic waves in homogeneous thermoviscous media
    Diaz, Manuel A.
    Solovchuk, Maxim A.
    Sheu, Tony W. H.
    COMPUTERS & FLUIDS, 2018, 173 : 195 - 205
  • [7] Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures
    Agullo, E.
    Giraud, L.
    Guermouche, A.
    Nakov, S.
    Roman, J.
    EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 69 - 82
  • [8] Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems
    Ernsting, Steffen
    Kuchen, Herbert
    APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 509 - 518
  • [9] High performance iterative solver for linear system using multi GPU
    Ikuno S.
    Fujita N.
    Kawaguchi Y.
    Itoh T.
    Nakata S.
    Watanabe K.
    Nakamura H.
    Plasma and Fusion Research, 2011, 6 (1 SPECIAL ISSUE)
  • [10] Hybrid Multi-GPU Solver Based on Schur Complement Method
    Kopysov, Sergey
    Kuzmin, Igor
    Nedozhogin, Nikita
    Novikov, Alexander
    Sagdeeva, Yulia
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2013), 2013, 7979 : 65 - 79