Method for scalable and performant GPU-accelerated simulation of multiphase compressible flow

被引:1
|
作者
Radhakrishnan, Anand [1 ]
Le Berre, Henry [1 ]
Wilfong, Benjamin [1 ]
Spratt, Jean-Sebastien [3 ]
Rodriguez Jr, Mauro [4 ]
Colonius, Tim [3 ]
Bryngelson, Spencer H. [1 ,2 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Daniel Guggenheim Sch Aerosp Engn, Atlanta, GA 30332 USA
[3] CALTECH, Div Engn & Appl Sci, Pasadena, CA 91125 USA
[4] Brown Univ, Sch Engn, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
Computational fluid dynamics; Heterogeneous computing; Multiphase flows; RIEMANN PROBLEM; RELAXATION; INTERFACES; FLUIDS;
D O I
10.1016/j.cpc.2024.109238
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiphase compressible flows are often characterized by a broad range of space and time scales, entailing large grids and small time steps. Simulations of these flows on CPU-based clusters can thus take several wall-clock days. Offloading the compute kernels to GPUs appears attractive but is memory-bound for many finite-volume and-difference methods, damping speedups. Even when realized, GPU-based kernels lead to more intrusive communication and I/O times owing to lower computation costs. We present a strategy for GPU acceleration of multiphase compressible flow solvers that addresses these challenges and obtains large speedups at scale. We use OpenACC for directive-based offloading of all compute kernels while maintaining low-level control when needed. An established Fortran preprocessor and metaprogramming tool, Fypp, enables otherwise hidden compile-time optimizations. This strategy exposes compile-time optimizations and high memory reuse while retaining readable, maintainable, and compact code. Remote direct memory access realized via CUDA-aware MPI and GPUDirect reduces halo-exchange communication time. We implement this approach in the open-source solver MFC [1]. Metaprogramming results in an 8-times speedup of the most expensive kernels compared to a statically compiled program, reaching 46% of peak FLOPs on modern NVIDIA GPUs and high arithmetic intensity (about 10 FLOPs/byte). In representative simulations, a single NVIDIA A100 GPU is 7-times faster compared to an Intel Xeon Cascade Lake (6248) CPU die, or about 300-times faster compared to a single such CPU core. At the same time, near-ideal (97%) weak scaling is observed for at least 13824 GPUs on OLCF Summit. A strong scaling efficiency of 84% is retained for an 8-times increase in GPU count. Collective I/O, implemented via MPI3, helps ensure the negligible contribution of data transfers (< 1% of the wall time for a typical, large simulation). Large many-GPU simulations of compressible (solid-)liquid-gas flows demonstrate the practical utility of this strategy.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] GPU-accelerated phase field simulation of directional solidification
    GAO Ang
    HU YanSu
    WANG ZhiJun
    MU DeJun
    LI JunJie
    WANG JinCheng
    Science China(Technological Sciences), 2014, (06) : 1191 - 1197
  • [32] GPU-accelerated surgery simulation for opening a brain fissure
    Sase K.
    Fukuhara A.
    Tsujita T.
    Konno A.
    ROBOMECH Journal, 2 (1):
  • [33] GPU-accelerated phase field simulation of directional solidification
    GAO Ang
    HU YanSu
    WANG ZhiJun
    MU DeJun
    LI JunJie
    WANG JinCheng
    Science China(Technological Sciences), 2014, 57 (06) : 1191 - 1197
  • [34] GPU-Accelerated Time-Domain Circuit Simulation
    Poore, R. E.
    PROCEEDINGS OF THE IEEE 2009 CUSTOM INTEGRATED CIRCUITS CONFERENCE, 2009, : 629 - 632
  • [35] GPU-accelerated phase field simulation of directional solidification
    Gao Ang
    Hu YanSu
    Wang ZhiJun
    Mu DeJun
    Li JunJie
    Wang JinCheng
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2014, 57 (06) : 1191 - 1197
  • [36] GPU-Accelerated Fault Simulation and Its New Applications
    Li, Huawei
    Xu, Dawen
    Cheng, Kwang-Ting
    2011 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2011, : 58 - 61
  • [37] A comparison of numerical schemes for the GPU-accelerated simulation of variably-saturated groundwater flow
    Li, Zhi
    Caviedes-Voullieme, Daniel
    Oezgen-Xian, Ilhan
    Jiang, Simin
    Zheng, Na
    ENVIRONMENTAL MODELLING & SOFTWARE, 2024, 171
  • [38] A GPU-accelerated shallow flow model for tsunami simulations
    Amouzgar, Reza
    Liang, Qiuhua
    Smith, Luke
    PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-ENGINEERING AND COMPUTATIONAL MECHANICS, 2014, 167 (03) : 117 - 125
  • [39] GPU-Accelerated Algorithm for Online Probabilistic Power Flow
    Zhou, Gan
    Bo, Rui
    Chien, Lungsheng
    Zhang, Xu
    Yang, Shengchun
    Su, Dawei
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2018, 33 (01) : 1132 - 1135
  • [40] GPU-accelerated Principal-Agent Game for Scalable Citizen Science
    Kabra, Anmol
    Xue, Yexiang
    Gomes, Carla P.
    COMPASS '19 - PROCEEDINGS OF THE CONFERENCE ON COMPUTING & SUSTAINABLE SOCIETIES, 2019, : 165 - 173