Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

被引:21
|
作者
Lai, Jianqi [1 ]
Yu, Hang [1 ]
Tian, Zhengyu [1 ]
Li, Hua [1 ]
机构
[1] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Peoples R China
关键词
DIRECT NUMERICAL-SIMULATION; FLOW SOLVER; MESHLESS METHOD; OPTIMIZATION; CPU/GPU; SEQUEL; SCHEME; GRIDS;
D O I
10.1155/2020/8862123
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM + UP upwind scheme and the three-step Runge-Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by theK-omega SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] GPU Parallelization of a Hybrid Pseudospectral Geophysical Turbulence Framework Using CUDA
    Rosenberg, Duane
    Mininni, Pablo D.
    Reddy, Raghu
    Pouquet, Annick
    ATMOSPHERE, 2020, 11 (02)
  • [22] A multi-GPU and CUDA-aware MPI-based spectral element formulation for ultrasonic wave propagation in solid media
    Li, Feilong
    Zou, Fangxin
    Rao, Jing
    ULTRASONICS, 2023, 134
  • [23] Effective Multi-GPU Communication Using Multiple CUDA Streams and Threads
    Sourouri, Mohammed
    Gillberg, Tor
    Baden, Scott B.
    Cai, Xing
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 981 - 986
  • [24] DART-CUDA: A PGAS Runtime System for Multi-GPU Systems
    Zhou, Lei
    Fuerlinger, Karl
    2015 14TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2015, : 110 - 119
  • [25] Impact of Reduced and Mixed-Precision on the Efficiency of a Multi-GPU Platform on CFD Applications
    Freytag, Gabriel
    Lima, Joao V. F.
    Rech, Paolo
    Navaux, Philippe O. A.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2022 WORKSHOPS, PART IV, 2022, 13380 : 570 - 587
  • [26] Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes
    Cabezas, Javier
    Vilanova, Lluis
    Gelado, Isaac
    Jablin, Thomas B.
    Navarro, Nacho
    Hwu, Wen-mei W.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 3 - 13
  • [27] WORKLOAD-AWARE AUTOMATIC PARALLELIZATION FOR MULTI-GPU DNN TRAINING
    Shin, Sungho
    Jo, Youngmin
    Choi, Jungwook
    Venkataramani, Swagath
    Srinivasan, Vijayalakshmi
    Sung, Wonyong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1453 - 1457
  • [28] Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters
    Al Badawi, Ahmad
    Veeravalli, Bharadwaj
    Lin, Jie
    Xiao, Nan
    Kazuaki, Matsumura
    Khin Mi Mi, Aung
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (02) : 379 - 391
  • [29] GPU-Centered Parallel Model on Heterogeneous Multi-GPU Clusters
    Wang, Feng
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1865 - 1868
  • [30] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
    Xu, Ao
    Li, Bo -Tao
    INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201