Optimizing Communications in multi-GPU Lattice Boltzmann Simulations

被引：8

作者：

Calore, Enrico ^{[1
,2
]}

Marchi, Davide ^{[3
]}

Schifano, Sebastiano Fabio ^{[2
,3
]}

Tripiccione, Raffaele ^{[1
,2
]}

机构：

[1] Univ Ferrara, Dipartimento Fis & Sci Terra, I-44122 Ferrara, Italy

[2] Ist Nazl Fis Nucl, I-44122 Ferrara, Italy

[3] Univ Ferrara, Dipartimento Matemat & Informat, I-44122 Ferrara, Italy

来源：

PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015) | 2015年

关键词：

CODE;

D O I：

10.1109/HPCSim.2015.7237021

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the application and map it onto the parallel resources available on GPUs. Regular grids and stencil codes are used in a subset of these applications, often corresponding to computational "Grand Challenges". One such class of applications are Lattice Boltzmann Methods (LB) used in computational fluid dynamics. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism like GPUs. Scalability of these applications on large clusters requires a careful design of processor-to-processor data communications, exploiting all possibilities to overlap communication and computation. This paper looks at these issues, considering as a use case a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We study in details the interplay between data organization and data layout, data-communication options and overlapping of communication and computation. We derive partial models of some performance features and compare with experimental results for production-grade codes that we run on a large cluster of GPUs.

引用

页码：55 / 62

页数：8

共 50 条

[1] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
Xu, Ao
Li, Bo -Tao
[J]. INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201
[2] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
Xu, Ao
Li, Bo-Tao
[J]. INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201
[3] Multi-GPU implementation of the lattice Boltzmann method
Obrecht, Christian
Kuznik, Frederic
Tourancheau, Bernard
Roux, Jean-Jacques
[J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (02) : 252 - 261
[4] Scalable multi-relaxation-time lattice Boltzmann simulations on multi-GPU cluster
Hong, Pei-Yao
Huang, Li-Min
Lin, Li-Song
Lin, Chao-An
[J]. COMPUTERS & FLUIDS, 2015, 110 : 1 - 8
[5] Multi-GPU lattice Boltzmann simulations of turbulent square duct flow at high Reynolds numbers
Xiang, Xing
Su, Weite
Hu, Tao
Wang, Limin
[J]. COMPUTERS & FLUIDS, 2023, 266
[6] Adjoint Lattice Boltzmann for topology optimization on multi-GPU architecture
Laniewski-Wollk, L.
Rokicki, J.
[J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2016, 71 (03) : 833 - 848
[7] Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method
Januszewski, M.
Kostur, M.
[J]. COMPUTER PHYSICS COMMUNICATIONS, 2014, 185 (09) : 2350 - 2368
[8] The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method
Obrecht, Christian
Kuznik, Frederic
Tourancheau, Bernard
Roux, Jean-Jacques
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (03): : 295 - 303
[9] An out-of-core method for physical simulations on a multi-GPU architecture using Lattice Boltzmann method
Duchateau, Julien
Rousselle, Francois
Maquignon, Nicolas
Roussel, Gilles
Renaud, Christophe
[J]. 2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 581 - 588
[10] Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
Xian, Wang
Takayuki, Aoki
[J]. PARALLEL COMPUTING, 2011, 37 (09) : 521 - 535

← 1 2 3 4 5 →