Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

被引：0

作者：

XIONG QinGang1

2 Graduate University of Chinese Academy of Sciences

机构：

来源：

Science Bulletin | 2012年 / 07期

基金：

中国国家自然科学基金;

关键词：

asynchronous execution; compute unified device architecture; graphic processing unit; lattice Boltzmann method; non-blocking message passing interface; OpenMP;

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

引用

页码：707 / 715

页数：9

共 50 条

[21] Implementation of a Lattice Boltzmann Method for Large Eddy Simulation on Multiple GPUs
Li, Qinjian
Zhong, Chengwen
Li, Kai
Zhang, Guangyong
Lu, Xiaowei
Zhang, Qing
Zhao, Kaiyong
Chu, Xiaowen
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 818 - 823
[22] GPU parallel implementation of a finite volume lattice Boltzmann method for incompressible flows
Wen, Mengke
Shen, Siyuan
Li, Weidong
Computers and Fluids, 2024, 285
[23] Efficient Implementation of Total FETI Solver for Graphic Processing Units Using Schur Complement
Riha, Lubomir
Brzobohaty, Tomas
Markopoulos, Alexandros
Kozubek, Tomas
Meca, Ondrej
Schenk, Olaf
Vanroose, Wim
HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING, HPCSE 2015, 2016, 9611 : 85 - 100
[24] Parallel data cube computation on graphic processing units
Zhou G.-L.
Chen H.
Li C.-P.
Wang S.
Zheng T.
Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (10): : 1788 - 1798
[25] PARALLEL EFFICIENT METHOD OF MOMENTS EXPLOITING GRAPHICS PROCESSING UNITS
De Donno, D.
Esposito, A.
Monti, G.
Tarricone, L.
MICROWAVE AND OPTICAL TECHNOLOGY LETTERS, 2010, 52 (11) : 2568 - 2572
[26] Implementation of Iron Loss Model on Graphic Processing Units
Hussain, Sajid
Silva, Rodrigo C. P.
Lowther, David A.
IEEE TRANSACTIONS ON MAGNETICS, 2016, 52 (03)
[27] A parallel lattice-Boltzmann method for large scale simulations of complex fluids
Nekovee, M
Chin, J
González-Segredo, N
Coveney, PV
COMPUTATIONAL FLUID DYNAMICS, 2001, : 204 - 212
[28] Parallel Lattice Boltzmann Method with Blocked Partitioning
Schepke, Claudio
Maillard, Nicolas
Navaux, Philippe O. A.
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2009, 37 (06) : 593 - 611
[29] Parallel Lattice Boltzmann Method with Blocked Partitioning
Claudio Schepke
Nicolas Maillard
Philippe O. A. Navaux
International Journal of Parallel Programming, 2009, 37 : 593 - 611
[30] Efficient computation of the geopotential gradient in graphic processing units
Rubio, Carlos
Gonzalo, Jesus
Siminski, Jan
Escapa, Alberto
ADVANCES IN SPACE RESEARCH, 2024, 74 (01) : 332 - 347

← 1 2 3 4 5 →