Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

被引:21
|
作者
Lai, Jianqi [1 ]
Yu, Hang [1 ]
Tian, Zhengyu [1 ]
Li, Hua [1 ]
机构
[1] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Peoples R China
关键词
DIRECT NUMERICAL-SIMULATION; FLOW SOLVER; MESHLESS METHOD; OPTIMIZATION; CPU/GPU; SEQUEL; SCHEME; GRIDS;
D O I
10.1155/2020/8862123
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM + UP upwind scheme and the three-step Runge-Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by theK-omega SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
    Xu, Ao
    Li, Bo-Tao
    INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201
  • [32] Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation
    Ayala, Alan
    Tomov, Stanimire
    Luo, Xi
    Shaiek, Hejer
    Haidar, Azzam
    Bosilca, George
    Dongarra, Jack
    PROCEEDINGS OF 2019 IEEE/ACM WORKSHOP ON EXASCALE MPI (EXAMPI 2019), 2019, : 12 - 18
  • [33] Accelerated CFD computations on multi-GPU using OpenMP and OpenACC
    Harshad Bhusare
    Nandan Sarkar
    Debajyoti Kumar
    Somnath Roy
    Sādhanā, 49
  • [34] Accelerated CFD computations on multi-GPU using OpenMP and OpenACC
    Bhusare, Harshad
    Sarkar, Nandan
    Kumar, Debajyoti
    Roy, Somnath
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2024, 49 (01):
  • [35] Experiences using hybrid MPI/OpenMP in the real world: Parallelization of a 3D CFD solver for multi-core node clusters
    Jost, Gabriele
    Robins, Bob
    SCIENTIFIC PROGRAMMING, 2010, 18 (3-4) : 127 - 138
  • [36] CUDA-MPI implementation of fast multipole method on GPU clusters for dielectric objects
    2018, Applied Computational Electromagnetics Society (ACES) (33):
  • [37] CUDA-MPI implementation of fast multipole method on GPU clusters for dielectric objects
    Tran, Nghia
    Phan, Tuan
    Kilic, Ozlem
    Applied Computational Electromagnetics Society Newsletter, 2018, 33 (02): : 224 - 227
  • [38] CUDA-MPI Implementation of Fast Multipole Method on GPU Clusters for Dielectric Objects
    Nghia Tran
    Tuan Phan
    Kilic, Ozlem
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2018, 33 (02): : 224 - 227
  • [39] Parallelization of lattice Boltzmann software for execution on multi-GPU clusters with application to the simulation of blood flow through human arteries
    Djukic, Tijana
    Filipovic, Nenad
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (IEEE BIBE 2021), 2021,
  • [40] Multi-user predictive rendering on remote multi-GPU clusters
    Randrianandrasana, J.
    Chanonier, A.
    Deleau, H.
    Muller, T.
    Porral, P.
    Krajecki, M.
    Lucas, L.
    2018 IEEE FOURTH VR INTERNATIONAL WORKSHOP ON COLLABORATIVE VIRTUAL ENVIRONMENTS (3DCVE), 2018,