Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

被引:0
|
作者
XIONG QinGang1
2 Graduate University of Chinese Academy of Sciences
机构
基金
中国国家自然科学基金;
关键词
asynchronous execution; compute unified device architecture; graphic processing unit; lattice Boltzmann method; non-blocking message passing interface; OpenMP;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.
引用
收藏
页码:707 / 715
页数:9
相关论文
共 50 条
  • [31] The fast multipole method on parallel clusters, multicore processors, and graphics processing units
    Darve, Eric
    Cecka, Cris
    Takahashi, Toru
    COMPTES RENDUS MECANIQUE, 2011, 339 (2-3): : 185 - 193
  • [32] PARALLEL IMPLEMENTATION OF A HYPERSPECTRAL UNMIXING CHAIN: GRAPHIC PROCESSING UNITS VERSUS MULTI-CORE PROCESSORS
    Bernabe, Sergio
    Plaza, Antonio
    Lopez, Sebastian
    Sarmiento, Roberto
    2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 3463 - 3466
  • [33] Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units
    Dehnavi, Maryam Mehri
    Fernandez, David M.
    Gaudiot, Jean-Luc
    Giannacopoulos, Dennis D.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (09) : 1852 - 1862
  • [34] Implementation of the Courtemanche's Auricle Model on Graphic Processing Units
    Osorio, John
    Hincapie, Juan
    Marin, Daniel
    Valencia, Ivan
    Henao, Oscar
    2016 IEEE 11TH COLOMBIAN COMPUTING CONFERENCE (CCC), 2016,
  • [35] Parallel implementation of a lattice Boltzmann algorithm for the electrostatic plasma turbulence
    Fogaccia, G
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 213 - 222
  • [36] An Efficient Parallel Computing Method for the Processing of Large Sensed Data
    Li, Dandan
    Ji, Xiaohui
    Wang, Qun
    AUTOMATIKA, 2013, 54 (04) : 471 - 482
  • [37] LATTICE BOLTZMANN SIMULATIONS OF CAVITY FLOWS ON GRAPHIC PROCESSING UNIT WITH MEMORY MANAGEMENT
    Hong, P. Y.
    Huang, L. M.
    Chang, C. Y.
    Lin, C. A.
    JOURNAL OF MECHANICS, 2017, 33 (06) : 863 - 871
  • [38] An efficient swap algorithm for the lattice Boltzmann method
    Mattila, Keijo
    Hyvaluoma, Jari
    Rossi, Tuomo
    Aspnas, Mats
    Westerholm, Jan
    COMPUTER PHYSICS COMMUNICATIONS, 2007, 176 (03) : 200 - 210
  • [39] Efficient Implementation of McEliece Cryptosystem on Graphic Processing Unit
    Elsobky, Alaa Mahmoud
    Farag, Abdelalim Kamal
    Keshk, Arabi
    INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS 2016), 2016, : 247 - 253
  • [40] GPU Based Parallel Computing of Lattice Boltzmann Method
    Zhang, Ruoxing
    Chou, Qiang
    Wang, Haidan
    Ge, Daochuan
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 43 - 49