Method for scalable and performant GPU-accelerated simulation of multiphase compressible flow

被引:1
|
作者
Radhakrishnan, Anand [1 ]
Le Berre, Henry [1 ]
Wilfong, Benjamin [1 ]
Spratt, Jean-Sebastien [3 ]
Rodriguez Jr, Mauro [4 ]
Colonius, Tim [3 ]
Bryngelson, Spencer H. [1 ,2 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Daniel Guggenheim Sch Aerosp Engn, Atlanta, GA 30332 USA
[3] CALTECH, Div Engn & Appl Sci, Pasadena, CA 91125 USA
[4] Brown Univ, Sch Engn, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
Computational fluid dynamics; Heterogeneous computing; Multiphase flows; RIEMANN PROBLEM; RELAXATION; INTERFACES; FLUIDS;
D O I
10.1016/j.cpc.2024.109238
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiphase compressible flows are often characterized by a broad range of space and time scales, entailing large grids and small time steps. Simulations of these flows on CPU-based clusters can thus take several wall-clock days. Offloading the compute kernels to GPUs appears attractive but is memory-bound for many finite-volume and-difference methods, damping speedups. Even when realized, GPU-based kernels lead to more intrusive communication and I/O times owing to lower computation costs. We present a strategy for GPU acceleration of multiphase compressible flow solvers that addresses these challenges and obtains large speedups at scale. We use OpenACC for directive-based offloading of all compute kernels while maintaining low-level control when needed. An established Fortran preprocessor and metaprogramming tool, Fypp, enables otherwise hidden compile-time optimizations. This strategy exposes compile-time optimizations and high memory reuse while retaining readable, maintainable, and compact code. Remote direct memory access realized via CUDA-aware MPI and GPUDirect reduces halo-exchange communication time. We implement this approach in the open-source solver MFC [1]. Metaprogramming results in an 8-times speedup of the most expensive kernels compared to a statically compiled program, reaching 46% of peak FLOPs on modern NVIDIA GPUs and high arithmetic intensity (about 10 FLOPs/byte). In representative simulations, a single NVIDIA A100 GPU is 7-times faster compared to an Intel Xeon Cascade Lake (6248) CPU die, or about 300-times faster compared to a single such CPU core. At the same time, near-ideal (97%) weak scaling is observed for at least 13824 GPUs on OLCF Summit. A strong scaling efficiency of 84% is retained for an 8-times increase in GPU count. Collective I/O, implemented via MPI3, helps ensure the negligible contribution of data transfers (< 1% of the wall time for a typical, large simulation). Large many-GPU simulations of compressible (solid-)liquid-gas flows demonstrate the practical utility of this strategy.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A GPU-accelerated implicit meshless method for compressible flows
    Zhang, Jia-Le
    Ma, Zhi-Hua
    Chen, Hong-Quan
    Cao, Cheng
    JOURNAL OF COMPUTATIONAL PHYSICS, 2018, 360 : 39 - 56
  • [2] A GPU-ACCELERATED MULTIPHASE COMPUTATIONAL TOOL FOR ASTEROID FRAGMENTATION/PULVERIZATION SIMULATION
    Zimmerman, Ben J.
    Wie, Bong
    SPACEFLIGHT MECHANICS 2016, PTS I-IV, 2016, 158 : 3575 - 3591
  • [3] Accelerated GPU simulation of compressible flow by the discontinuous evolution Galerkin method
    B. J. Block
    M. Lukáčová-Medvid’ová
    P. Virnau
    L. Yelash
    The European Physical Journal Special Topics, 2012, 210 : 119 - 132
  • [4] Accelerated GPU simulation of compressible flow by the discontinuous evolution Galerkin method
    Block, B. J.
    Lukacova-Medvid'ova, M.
    Virnau, P.
    Yelash, L.
    EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2012, 210 (01): : 119 - 132
  • [5] Highly-scalable GPU-accelerated compressible reacting flow solver for modeling high-speed flows
    Bielawski, Ral
    Barwey, Shivam
    Prakash, Supraj
    Raman, Venkat
    COMPUTERS & FLUIDS, 2023, 265
  • [6] GPU-accelerated DNS of compressible turbulent flows
    Kim, Youngdae
    Ghosh, Debojyoti
    Constantinescu, Emil M.
    Balakrishnan, Ramesh
    COMPUTERS & FLUIDS, 2023, 251
  • [7] GPU-ACCELERATED SIMULATION OF A ROTARY VALVE BY THE DISCRETE ELEMENT METHOD
    Fuvesi, Balazs
    Ulbert, Zsolt
    HUNGARIAN JOURNAL OF INDUSTRY AND CHEMISTRY, 2019, 47 (02): : 31 - 42
  • [8] GPU-Accelerated Scalable Solver for Banded Linear Systems
    Liu, Hang
    Seo, Jung-Hee
    Mital, Rajat
    Huang, H. Howie
    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [9] FluTAS: A GPU-accelerated finite difference code for multiphase flows
    Crialesi-Esposito, Marco
    Scapin, Nicolo
    Demou, Andreas D.
    Rosti, Marco Edoardo
    Costa, Pedro
    Spiga, Filippo
    Brandt, Luca
    COMPUTER PHYSICS COMMUNICATIONS, 2023, 284
  • [10] GPU-Accelerated Finite Element Method
    Dziekonski, Adam
    Lamecki, Adam
    Mrozowski, Michal
    2016 IEEE MTT-S INTERNATIONAL CONFERENCE ON NUMERICAL ELECTROMAGNETIC AND MULTIPHYSICS MODELING AND OPTIMIZATION (NEMO), 2016,