Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters

被引:21
|
作者
Lai, Jianqi [1 ]
Yu, Hang [1 ]
Tian, Zhengyu [1 ]
Li, Hua [1 ]
机构
[1] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Peoples R China
关键词
DIRECT NUMERICAL-SIMULATION; FLOW SOLVER; MESHLESS METHOD; OPTIMIZATION; CPU/GPU; SEQUEL; SCHEME; GRIDS;
D O I
10.1155/2020/8862123
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Graphics processing units (GPUs) have a strong floating-point capability and a high memory bandwidth in data parallelism and have been widely used in high-performance computing (HPC). Compute unified device architecture (CUDA) is used as a parallel computing platform and programming model for the GPU to reduce the complexity of programming. The programmable GPUs are becoming popular in computational fluid dynamics (CFD) applications. In this work, we propose a hybrid parallel algorithm of the message passing interface and CUDA for CFD applications on multi-GPU HPC clusters. The AUSM + UP upwind scheme and the three-step Runge-Kutta method are used for spatial discretization and time discretization, respectively. The turbulent solution is solved by theK-omega SST two-equation model. The CPU only manages the execution of the GPU and communication, and the GPU is responsible for data processing. Parallel execution and memory access optimizations are used to optimize the GPU-based CFD codes. We propose a nonblocking communication method to fully overlap GPU computing, CPU_CPU communication, and CPU_GPU data transfer by creating two CUDA streams. Furthermore, the one-dimensional domain decomposition method is used to balance the workload among GPUs. Finally, we evaluate the hybrid parallel algorithm with the compressible turbulent flow over a flat plate. The performance of a single GPU implementation and the scalability of multi-GPU clusters are discussed. Performance measurements show that multi-GPU parallelization can achieve a speedup of more than 36 times with respect to CPU-based parallel computing, and the parallel algorithm has good scalability.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Algorithmic skeletons for multi-core, multi-GPU systems and clusters
    Ernsting, Steffen
    Kuchen, Herbert
    International Journal of High Performance Computing and Networking, 2012, 7 (02) : 129 - 138
  • [42] Directionally unsplit hydrodynamic schemes with hybrid MPI/OpenMP/GPU parallelization in AMR
    Schive, Hsi-Yu
    Zhang, Ui-Han
    Chiueh, Tzihong
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2012, 26 (04): : 367 - 377
  • [43] JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization
    Matsumura, Kazuaki
    de Gonzalo, Simon Garcia
    Pena, Antonio J.
    2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 182 - 191
  • [44] MARBLE: A Multi-GPU Aware Job Scheduler for Deep Learning on HPC Systems
    Han, Jingoo
    Rafique, M. Mustafa
    Xu, Luna
    Butt, Ali R.
    Lim, Seung-Hwan
    Vazhkudai, Sudharshan S.
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 272 - 281
  • [45] Boosting CUDA Applications with CPU–GPU Hybrid Computing
    Changmin Lee
    Won Woo Ro
    Jean-Luc Gaudiot
    International Journal of Parallel Programming, 2014, 42 : 384 - 404
  • [46] Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
    Xie, Chenhao
    Chen, Jieyang
    Firoz, Jesun
    Li, Jiajia
    Song, Shuaiwen Leon
    Barker, Kevin
    Raugas, Mark
    Li, Ang
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [47] Distributed Join Algorithms on Multi-GPU Clusters with GPUDirect RDMA
    Guo, Chengxin
    Chen, Hong
    Zhang, Feng
    Li, Cuiping
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [48] Financial applications on multi-CPU and multi-GPU architectures
    Department of Computer Science and Electronics, Universidad de Cantabria, Santander, Spain
    不详
    J Supercomput, 2 (729-739):
  • [49] Financial applications on multi-CPU and multi-GPU architectures
    Castillo, Emilio
    Camarero, Cristobal
    Borrego, Ana
    Luis Bosque, Jose
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (02): : 729 - 739
  • [50] Financial applications on multi-CPU and multi-GPU architectures
    Emilio Castillo
    Cristóbal Camarero
    Ana Borrego
    Jose Luis Bosque
    The Journal of Supercomputing, 2015, 71 : 729 - 739