Hybrid MPI and CUDA paralleled finite volume unstructured CFD simulations on a multi-GPU system

被引:10
|
作者
Zhang, Xi [1 ]
Guo, Xiaohu [2 ]
Weng, Yue [1 ]
Zhang, Xianwei [1 ]
Lu, Yutong [1 ]
Zhao, Zhong [3 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, 132 East Outer Ring Rd, Guangzhou 510006, Guangdong, Peoples R China
[2] STFC Daresbury Lab, Hartree Ctr, Keckwick Lane, Warrington WA4 4AD, England
[3] China Aerodynam Res & Dev Ctr, Computat Aerodynam Inst, 6 South Sect,Second Ring Rd, Mianyang 621000, Sichuan, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
Computational fluid dynamics; Unstructured mesh; Compressible flow; Graphic processing units; Optimizations; Scalability; SOLVERS;
D O I
10.1016/j.future.2022.09.005
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Porting unstructured Computational Fluid Dynamics (CFD) analysis of compressible flow to Graphics Processing Units (GPUs) confronts two difficulties. Firstly, non-coalescing access to the GPU's global memory is induced by indirect data access leading to performance loss. Secondly, data exchange among multi-GPU is complex due to data communication between processes and transfer between host and device, which degrades scalability. For increasing data locality on unstructured finite volume GPU simulations for compressible flow, we perform some optimizations, including cell and face renumbering, data dependence resolving, nested loops split, and loop mode adjustment. Then, a hybrid MPI-CUDA parallel framework with packing and unpacking exchange data on GPU is established for multi-GPU computing. Finally, after optimizations, the performance of the whole application on a GPU is increased by around 50%. Simulations of ONERA M6 cases on a single GPU (Nvidia Tesla V100) can achieve an average of 13.4 speedup compared to those on 28 CPU cores (Intel Xeon Gold 6132). On the baseline of 2 GPUs, strong scaling results show a parallel efficiency of 42% on 200 GPUs, while weak scaling tests give a parallel efficiency of 82.4% up to 200 GPUs.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 33 条
  • [1] Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters
    Lai, Jianqi
    Yu, Hang
    Tian, Zhengyu
    Li, Hua
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [2] Multi-GPU Kinetic Solvers using MPI and CUDA
    Zabelok, Sergey
    Arslanbekov, Robert
    Kolobov, Vladimir
    PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON RAREFIED GAS DYNAMICS, 2014, 1628 : 539 - 546
  • [3] Parallel QR Factorization using Givens Rotations in MPI-CUDA for Multi-GPU
    Tapia-Romero, Miguel
    Meneses-Viveros, Amilcar
    Hernandez-Rubio, Erika
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 636 - 645
  • [4] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
    Xu, Ao
    Li, Bo -Tao
    INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201
  • [5] Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
    Xu, Ao
    Li, Bo-Tao
    INTERNATIONAL JOURNAL OF HEAT AND MASS TRANSFER, 2023, 201
  • [6] Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct-MPI hybrid approach
    Wong, Un-Hong
    Aoki, Takayuki
    Wong, Hon-Cheng
    COMPUTER PHYSICS COMMUNICATIONS, 2014, 185 (07) : 1901 - 1913
  • [7] A multi-GPU finite volume solver for magnetohydrodynamics-based solar wind simulations
    Wang, Yuan
    Feng, Xueshang
    Zhou, Yufen
    Gan, Xinbiao
    COMPUTER PHYSICS COMMUNICATIONS, 2019, 238 : 181 - 193
  • [8] Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication
    Potluri, S.
    Wang, H.
    Bureddy, D.
    Singh, A. K.
    Rosales, C.
    Panda, D. K.
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 1848 - 1857
  • [9] DART-CUDA: A PGAS Runtime System for Multi-GPU Systems
    Zhou, Lei
    Fuerlinger, Karl
    2015 14TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2015, : 110 - 119
  • [10] Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures.
    Blazewicz, Marek
    Kurowski, Krzysztof'
    Ludwiczak, Bogdan
    Napierala, Krystyna
    NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS I-III, 2010, 1281 : 1301 - 1304