Method for scalable and performant GPU-accelerated simulation of multiphase compressible flow

被引:1
|
作者
Radhakrishnan, Anand [1 ]
Le Berre, Henry [1 ]
Wilfong, Benjamin [1 ]
Spratt, Jean-Sebastien [3 ]
Rodriguez Jr, Mauro [4 ]
Colonius, Tim [3 ]
Bryngelson, Spencer H. [1 ,2 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Daniel Guggenheim Sch Aerosp Engn, Atlanta, GA 30332 USA
[3] CALTECH, Div Engn & Appl Sci, Pasadena, CA 91125 USA
[4] Brown Univ, Sch Engn, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
Computational fluid dynamics; Heterogeneous computing; Multiphase flows; RIEMANN PROBLEM; RELAXATION; INTERFACES; FLUIDS;
D O I
10.1016/j.cpc.2024.109238
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiphase compressible flows are often characterized by a broad range of space and time scales, entailing large grids and small time steps. Simulations of these flows on CPU-based clusters can thus take several wall-clock days. Offloading the compute kernels to GPUs appears attractive but is memory-bound for many finite-volume and-difference methods, damping speedups. Even when realized, GPU-based kernels lead to more intrusive communication and I/O times owing to lower computation costs. We present a strategy for GPU acceleration of multiphase compressible flow solvers that addresses these challenges and obtains large speedups at scale. We use OpenACC for directive-based offloading of all compute kernels while maintaining low-level control when needed. An established Fortran preprocessor and metaprogramming tool, Fypp, enables otherwise hidden compile-time optimizations. This strategy exposes compile-time optimizations and high memory reuse while retaining readable, maintainable, and compact code. Remote direct memory access realized via CUDA-aware MPI and GPUDirect reduces halo-exchange communication time. We implement this approach in the open-source solver MFC [1]. Metaprogramming results in an 8-times speedup of the most expensive kernels compared to a statically compiled program, reaching 46% of peak FLOPs on modern NVIDIA GPUs and high arithmetic intensity (about 10 FLOPs/byte). In representative simulations, a single NVIDIA A100 GPU is 7-times faster compared to an Intel Xeon Cascade Lake (6248) CPU die, or about 300-times faster compared to a single such CPU core. At the same time, near-ideal (97%) weak scaling is observed for at least 13824 GPUs on OLCF Summit. A strong scaling efficiency of 84% is retained for an 8-times increase in GPU count. Collective I/O, implemented via MPI3, helps ensure the negligible contribution of data transfers (< 1% of the wall time for a typical, large simulation). Large many-GPU simulations of compressible (solid-)liquid-gas flows demonstrate the practical utility of this strategy.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Demo: Image Disguising for Scalable GPU-accelerated Confidential Deep Learning
    Gu, Yuechun
    Sharma, Sagar
    Chen, Keke
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3679 - 3681
  • [42] GPU-Accelerated Solutions to Optimal Power Flow Problems
    Rakai, Logan
    Rosehart, William
    2014 47TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2014, : 2511 - 2516
  • [43] Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication
    Tan, Nigel
    Luettgau, Jakob
    Marquez, Jack
    Terianishi, Keita
    Morales, Nicolas
    Bhowmick, Sanjukta
    Cappello, Franck
    Taufer, Michela
    Nicolae, Bogdan
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 665 - 674
  • [44] Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion
    Li, Yanchen
    Li, Jiachun
    Sun, Kebin
    Leng, Luziwei
    Cheng, Ran
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 58 - 73
  • [45] Numerical Simulation of Liquid Sloshing in Lng Tank Using Gpu-Accelerated Mps Method
    Chen X.
    Wan D.
    Lixue Xuebao/Chinese Journal of Theoretical and Applied Mechanics, 2019, 51 (03): : 714 - 729
  • [46] A GPU-Accelerated Harmonic Balance Method for Nonlinear Radio-Frequency Circuit Simulation
    Wang, Zhengzhuo
    Sha, Yanliang
    Ouyang, Lingyun
    Chen, Quan
    Hu, Jianguo
    Wang, Deming
    2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 764 - 764
  • [47] GPU-Accelerated Method of Moments by Example: Monostatic Scattering
    Lezar, Evan
    Davidson, David B.
    IEEE ANTENNAS AND PROPAGATION MAGAZINE, 2010, 52 (06) : 120 - 135
  • [48] A GPU-accelerated solver for turbulent flow and scalar transport based on the Lattice Boltzmann method
    Ren, Feng
    Song, Baowei
    Zhang, Ya
    Hu, Haibao
    COMPUTERS & FLUIDS, 2018, 173 : 29 - 36
  • [49] GPU-Accelerated Sparse LU Factorization for Power System Simulation
    Gnanavignesh, R.
    Shenoy, U. Jayachandra
    Proceedings of 2019 IEEE PES Innovative Smart Grid Technologies Europe, ISGT-Europe 2019, 2019,
  • [50] GPU-Accelerated Field Simulation of HVAC Gas Insulated Lines
    Hensel, Hendrik
    Henkel, Marvin-Lucas
    Haussmann, Norman
    Joergens, Christoph
    Stroka, Steven
    Clemens, Markus
    TWENTIETH BIENNIAL IEEE CONFERENCE ON ELECTROMAGNETIC FIELD COMPUTATION (IEEE CEFC 2022), 2022,