Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

被引:21
|
作者
Lai, Jianqi [1 ]
Yu, Hang [1 ]
Tian, Zhengyu [1 ]
Li, Hua [1 ]
机构
[1] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Peoples R China
关键词
DIRECT NUMERICAL-SIMULATION; FLOW SOLVER; MESHLESS METHOD; OPTIMIZATION; CPU/GPU; SEQUEL; SCHEME; GRIDS;
D O I
10.1155/2020/8862123
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM + UP upwind scheme and the three-step Runge-Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by theK-omega SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Hybrid MPI and CUDA paralleled finite volume unstructured CFD simulations on a multi-GPU system
    Zhang, Xi
    Guo, Xiaohu
    Weng, Yue
    Zhang, Xianwei
    Lu, Yutong
    Zhao, Zhong
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 139 : 1 - 16
  • [2] Multi-GPU Kinetic Solvers using MPI and CUDA
    Zabelok, Sergey
    Arslanbekov, Robert
    Kolobov, Vladimir
    PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON RAREFIED GAS DYNAMICS, 2014, 1628 : 539 - 546
  • [3] Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures.
    Blazewicz, Marek
    Kurowski, Krzysztof'
    Ludwiczak, Bogdan
    Napierala, Krystyna
    NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS I-III, 2010, 1281 : 1301 - 1304
  • [4] Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters
    Yang, Chao-Tung
    Huang, Chih-Lin
    Lin, Cheng-Fang
    COMPUTER PHYSICS COMMUNICATIONS, 2011, 182 (01) : 266 - 269
  • [5] Parallel QR Factorization using Givens Rotations in MPI-CUDA for Multi-GPU
    Tapia-Romero, Miguel
    Meneses-Viveros, Amilcar
    Hernandez-Rubio, Erika
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 636 - 645
  • [6] Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes
    Jo, Gangwon
    Nah, Jeongho
    Lee, Jun
    Kim, Jungwon
    Lee, Jaejin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (07) : 1814 - 1825
  • [7] Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication
    Potluri, S.
    Wang, H.
    Bureddy, D.
    Singh, A. K.
    Rosales, C.
    Panda, D. K.
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 1848 - 1857
  • [8] Design of a Hybrid MPI-CUDA Benchmark Suite for CPU-GPU Clusters
    Agarwal, Tejaswi
    Becchi, Michela
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 505 - 506
  • [9] Benchmarking multi-GPU applications on modern multi-GPU integrated systems
    Bernaschi, Massimo
    Agostini, Elena
    Rossetti, Davide
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (14):
  • [10] The Optimization of Model Parallelization Strategies for Multi-GPU Training
    Zhang, Zechao
    Chen, Jianfeng
    Hu, Bing
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,