Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units

被引:0
|
作者
XIONG QinGang1
2 Graduate University of Chinese Academy of Sciences
机构
基金
中国国家自然科学基金;
关键词
asynchronous execution; compute unified device architecture; graphic processing unit; lattice Boltzmann method; non-blocking message passing interface; OpenMP;
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods.
引用
收藏
页码:707 / 715
页数:9
相关论文
共 50 条
  • [41] Parallel simulation of particle suspensions with the lattice Boltzmann method
    Stratford, Kevin
    Pagonabarraga, Ignacio
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (07) : 1585 - 1593
  • [42] An Efficient Implementation Method of Parallel Processing Viterbi Decoders for UWB Systems
    Lee, Seongjoo
    ECTI-CON: 2009 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 483 - 486
  • [43] Parallel performance of the interpolation supplemented lattice Boltzmann method
    Sunder, CS
    Baskar, G
    Babu, V
    Strenski, D
    HIGH PERFORMANCE COMPUTING - HIPC 2003, 2003, 2913 : 428 - 437
  • [44] An efficient parallel implementation of a lattice pricing model
    Nielsen, SS
    COMPUTATIONAL APPROACHES TO ECONOMIC PROBLEMS, 1997, 6 : 161 - 173
  • [45] Parallel frequent itemsets mining using distributed graphic processing units
    Ali Abbas Zoraghchian
    Mohammad Karim Sohrabi
    Farzin Yaghmaee
    Multimedia Tools and Applications, 2022, 81 : 43873 - 43895
  • [46] Parallel frequent itemsets mining using distributed graphic processing units
    Zoraghchian, Ali Abbas
    Sohrabi, Mohammad Karim
    Yaghmaee, Farzin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (30) : 43873 - 43895
  • [47] Parallel Genome-Wide Analysis With Central And Graphic Processing Units
    Kacamarga, Muhamad Fitra
    Baurley, James W.
    Pardamean, Bens
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 265 - 269
  • [48] Lattice Boltzmann method for simulating transport phenomena avoiding the use of lattice units
    Martins, Ivan T.
    Alvarino, Pablo F.
    Cabezas-Gomez, Luben
    JOURNAL OF THE BRAZILIAN SOCIETY OF MECHANICAL SCIENCES AND ENGINEERING, 2024, 46 (06)
  • [49] Validation of the lattice Boltzmann method implementation in a drip emitter
    Ma S.
    Wei Z.
    Ma R.
    Zhang Y.
    Ma, S. (mashengli1987@163.com), 1600, American Society of Agricultural and Biological Engineers (59): : 107 - 113
  • [50] Multi-GPU implementation of the lattice Boltzmann method
    Obrecht, Christian
    Kuznik, Frederic
    Tourancheau, Bernard
    Roux, Jean-Jacques
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 65 (02) : 252 - 261