Reducing Communication Overhead in Multi-GPU Hybrid Solver for 2D Laplace's Equation

被引:1
|
作者
Czapinski, Michal [1 ]
Thompson, Chris [1 ]
Barnes, Stuart [1 ]
机构
[1] Cranfield Univ, Appl Math & Comp Grp, Cranfield MK43 0AL, Beds, England
关键词
Hybrid parallelism; Multiple GPUs; Heterogeneous architectures; Non-blocking communication; Laplace solver; CUDA; CONJUGATE GRADIENTS; GRAPHICS; IMPLEMENTATION; COMPUTATION; OVERLAP;
D O I
10.1007/s10766-013-0293-2
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The possibility of porting algorithms to graphics processing units (GPUs) raises significant interest among researchers. The natural next step is to employ multiple GPUs, but communication overhead may limit further performance improvement. In this paper, we investigate techniques reducing overhead on hybrid CPU-GPU platforms, including careful data layout and usage of GPU memory spaces, and use of non-blocking communication. In addition, we propose an accurate automatic load balancing technique for heterogeneous environments. We validate our approach on a hybrid Jacobi solver for 2D Laplace's Equation. Experiments carried out using various graphics hardware and types of connectivity have confirmed that the proposed data layout allows our fastest CUDA kernels to reach the analytical limit for memory bandwidth (up to 106 GB/s on NVidia GTX 480), and that the non-blocking communication significantly reduces overhead, allowing for almost linear speed-up, even when communication is carried out over relatively slow networks.
引用
收藏
页码:1032 / 1047
页数:16
相关论文
共 50 条
  • [21] Fundamental Solutions of the 2D Neumann Problem for the Laplace Equation
    A. V. Setukha
    Differential Equations, 2003, 39 : 135 - 144
  • [22] Fundamental solutions of the 2D Neumann problem for the Laplace equation
    Setukha, AV
    DIFFERENTIAL EQUATIONS, 2003, 39 (01) : 135 - 144
  • [23] COMMUNICATION-MINIMIZING 2D CONVOLUTION IN GPU REGISTERS
    Iandola, Forrest N.
    Sheffield, David
    Anderson, Michael J.
    Phothilimthana, Phitchaya Mangpo
    Keutzer, Kurt
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2116 - 2120
  • [24] A Multi-GPU Implementation of a D2Q37 Lattice Boltzmann Code
    Biferale, Luca
    Mantovani, Filippo
    Pivanti, Marcello
    Pozzati, Fabio
    Sbragaglia, Mauro
    Scagliarini, Andrea
    Schifano, Sebastiano Fabio
    Toschi, Federico
    Tripiccione, Raffaele
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2012, 7203 : 640 - 650
  • [25] FDM data driven U-Net as a 2D Laplace PINN solver
    Antony, Anto Nivin Maria
    Narisetti, Narendra
    Gladilin, Evgeny
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [26] FDM data driven U-Net as a 2D Laplace PINN solver
    Anto Nivin Maria Antony
    Narendra Narisetti
    Evgeny Gladilin
    Scientific Reports, 13
  • [27] A FAST IMPLICIT VARIABLE SPEED 2D WAVE EQUATION SOLVER
    Thavappiragasam, M.
    Viswanathan, A.
    Christlieb, A.
    2016 43RD IEEE INTERNATIONAL CONFERENCE ON PLASMA SCIENCE (ICOPS), 2016,
  • [28] Fast direct solver for Poisson equation in a 2D elliptical domain
    Lai, MC
    NUMERICAL METHODS FOR PARTIAL DIFFERENTIAL EQUATIONS, 2004, 20 (01) : 72 - 81
  • [29] Degenerate scale for 2D Laplace equation with Robin boundary condition
    Corfdir, A.
    Bonnet, G.
    ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 2017, 80 : 49 - 57
  • [30] Analytic solution for fluxes at interior points for the 2D Laplace equation
    Yoon, SS
    Heister, SD
    ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 2000, 24 (02) : 155 - 160