Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

被引：0

作者：

XIONG QinGang1

2 Graduate University of Chinese Academy of Sciences

机构：

来源：

Science Bulletin | 2012年 / 07期

基金：

中国国家自然科学基金;

关键词：

asynchronous execution; compute unified device architecture; graphic processing unit; lattice Boltzmann method; non-blocking message passing interface; OpenMP;

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

引用

页码：707 / 715

页数：9

共 50 条

[31] The fast multipole method on parallel clusters, multicore processors, and graphics processing units
Darve, Eric
Cecka, Cris
Takahashi, Toru
COMPTES RENDUS MECANIQUE, 2011, 339 (2-3): : 185 - 193
[32] PARALLEL IMPLEMENTATION OF A HYPERSPECTRAL UNMIXING CHAIN: GRAPHIC PROCESSING UNITS VERSUS MULTI-CORE PROCESSORS
Bernabe, Sergio
Plaza, Antonio
Lopez, Sebastian
Sarmiento, Roberto
2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 3463 - 3466
[33] Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units
Dehnavi, Maryam Mehri
Fernandez, David M.
Gaudiot, Jean-Luc
Giannacopoulos, Dennis D.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (09) : 1852 - 1862
[34] Implementation of the Courtemanche's Auricle Model on Graphic Processing Units
Osorio, John
Hincapie, Juan
Marin, Daniel
Valencia, Ivan
Henao, Oscar
2016 IEEE 11TH COLOMBIAN COMPUTING CONFERENCE (CCC), 2016,
[35] Parallel implementation of a lattice Boltzmann algorithm for the electrostatic plasma turbulence
Fogaccia, G
HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 213 - 222
[36] An Efficient Parallel Computing Method for the Processing of Large Sensed Data
Li, Dandan
Ji, Xiaohui
Wang, Qun
AUTOMATIKA, 2013, 54 (04) : 471 - 482
[37] LATTICE BOLTZMANN SIMULATIONS OF CAVITY FLOWS ON GRAPHIC PROCESSING UNIT WITH MEMORY MANAGEMENT
Hong, P. Y.
Huang, L. M.
Chang, C. Y.
Lin, C. A.
JOURNAL OF MECHANICS, 2017, 33 (06) : 863 - 871
[38] An efficient swap algorithm for the lattice Boltzmann method
Mattila, Keijo
Hyvaluoma, Jari
Rossi, Tuomo
Aspnas, Mats
Westerholm, Jan
COMPUTER PHYSICS COMMUNICATIONS, 2007, 176 (03) : 200 - 210
[39] Efficient Implementation of McEliece Cryptosystem on Graphic Processing Unit
Elsobky, Alaa Mahmoud
Farag, Abdelalim Kamal
Keshk, Arabi
INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS 2016), 2016, : 247 - 253
[40] GPU Based Parallel Computing of Lattice Boltzmann Method
Zhang, Ruoxing
Chou, Qiang
Wang, Haidan
Ge, Daochuan
INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 43 - 49

← 1 2 3 4 5 →