Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

被引：0

作者：

XIONG QinGang1

2 Graduate University of Chinese Academy of Sciences

机构：

来源：

Science Bulletin | 2012年 / 07期

基金：

中国国家自然科学基金;

关键词：

asynchronous execution; compute unified device architecture; graphic processing unit; lattice Boltzmann method; non-blocking message passing interface; OpenMP;

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.

引用

页码：707 / 715

页数：9

共 50 条

[41] Parallel simulation of particle suspensions with the lattice Boltzmann method
Stratford, Kevin
Pagonabarraga, Ignacio
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (07) : 1585 - 1593
[42] An Efficient Implementation Method of Parallel Processing Viterbi Decoders for UWB Systems
Lee, Seongjoo
ECTI-CON: 2009 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 483 - 486
[43] Parallel performance of the interpolation supplemented lattice Boltzmann method
Sunder, CS
Baskar, G
Babu, V
Strenski, D
HIGH PERFORMANCE COMPUTING - HIPC 2003, 2003, 2913 : 428 - 437
[44] An efficient parallel implementation of a lattice pricing model
Nielsen, SS
COMPUTATIONAL APPROACHES TO ECONOMIC PROBLEMS, 1997, 6 : 161 - 173
[45] Parallel frequent itemsets mining using distributed graphic processing units
Ali Abbas Zoraghchian
Mohammad Karim Sohrabi
Farzin Yaghmaee
Multimedia Tools and Applications, 2022, 81 : 43873 - 43895
[46] Parallel frequent itemsets mining using distributed graphic processing units
Zoraghchian, Ali Abbas
Sohrabi, Mohammad Karim
Yaghmaee, Farzin
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (30) : 43873 - 43895
[47] Parallel Genome-Wide Analysis With Central And Graphic Processing Units
Kacamarga, Muhamad Fitra
Baurley, James W.
Pardamean, Bens
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 265 - 269
[48] Lattice Boltzmann method for simulating transport phenomena avoiding the use of lattice units
Martins, Ivan T.
Alvarino, Pablo F.
Cabezas-Gomez, Luben
JOURNAL OF THE BRAZILIAN SOCIETY OF MECHANICAL SCIENCES AND ENGINEERING, 2024, 46 (06)
[49] Validation of the lattice Boltzmann method implementation in a drip emitter
Ma S.
Wei Z.
Ma R.
Zhang Y.
Ma, S. (mashengli1987@163.com), 1600, American Society of Agricultural and Biological Engineers (59): : 107 - 113
[50] Multi-GPU implementation of the lattice Boltzmann method
Obrecht, Christian
Kuznik, Frederic
Tourancheau, Bernard
Roux, Jean-Jacques
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (02) : 252 - 261

← 1 2 3 4 5 →