Optimizing Communications in multi-GPU Lattice Boltzmann Simulations

被引:8
|
作者
Calore, Enrico [1 ,2 ]
Marchi, Davide [3 ]
Schifano, Sebastiano Fabio [2 ,3 ]
Tripiccione, Raffaele [1 ,2 ]
机构
[1] Univ Ferrara, Dipartimento Fis & Sci Terra, I-44122 Ferrara, Italy
[2] Ist Nazl Fis Nucl, I-44122 Ferrara, Italy
[3] Univ Ferrara, Dipartimento Matemat & Informat, I-44122 Ferrara, Italy
关键词
CODE;
D O I
10.1109/HPCSim.2015.7237021
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the application and map it onto the parallel resources available on GPUs. Regular grids and stencil codes are used in a subset of these applications, often corresponding to computational "Grand Challenges". One such class of applications are Lattice Boltzmann Methods (LB) used in computational fluid dynamics. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism like GPUs. Scalability of these applications on large clusters requires a careful design of processor-to-processor data communications, exploiting all possibilities to overlap communication and computation. This paper looks at these issues, considering as a use case a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We study in details the interplay between data organization and data layout, data-communication options and overlapping of communication and computation. We derive partial models of some performance features and compare with experimental results for production-grade codes that we run on a large cluster of GPUs.
引用
收藏
页码:55 / 62
页数:8
相关论文
共 50 条
  • [1] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
    Xu, Ao
    Li, Bo -Tao
    [J]. INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201
  • [2] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
    Xu, Ao
    Li, Bo-Tao
    [J]. INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201
  • [3] Multi-GPU implementation of the lattice Boltzmann method
    Obrecht, Christian
    Kuznik, Frederic
    Tourancheau, Bernard
    Roux, Jean-Jacques
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (02) : 252 - 261
  • [4] Scalable multi-relaxation-time lattice Boltzmann simulations on multi-GPU cluster
    Hong, Pei-Yao
    Huang, Li-Min
    Lin, Li-Song
    Lin, Chao-An
    [J]. COMPUTERS & FLUIDS, 2015, 110 : 1 - 8
  • [5] Multi-GPU lattice Boltzmann simulations of turbulent square duct flow at high Reynolds numbers
    Xiang, Xing
    Su, Weite
    Hu, Tao
    Wang, Limin
    [J]. COMPUTERS & FLUIDS, 2023, 266
  • [6] Adjoint Lattice Boltzmann for topology optimization on multi-GPU architecture
    Laniewski-Wollk, L.
    Rokicki, J.
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2016, 71 (03) : 833 - 848
  • [7] Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method
    Januszewski, M.
    Kostur, M.
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2014, 185 (09) : 2350 - 2368
  • [8] The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method
    Obrecht, Christian
    Kuznik, Frederic
    Tourancheau, Bernard
    Roux, Jean-Jacques
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (03): : 295 - 303
  • [9] An out-of-core method for physical simulations on a multi-GPU architecture using Lattice Boltzmann method
    Duchateau, Julien
    Rousselle, Francois
    Maquignon, Nicolas
    Roussel, Gilles
    Renaud, Christophe
    [J]. 2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 581 - 588
  • [10] Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
    Xian, Wang
    Takayuki, Aoki
    [J]. PARALLEL COMPUTING, 2011, 37 (09) : 521 - 535