Method for scalable and performant GPU-accelerated simulation of multiphase compressible flow

被引：1

作者：

Radhakrishnan, Anand ^{[1
]}

Le Berre, Henry ^{[1
]}

Wilfong, Benjamin ^{[1
]}

Spratt, Jean-Sebastien ^{[3
]}

Rodriguez Jr, Mauro ^{[4
]}

Colonius, Tim ^{[3
]}

Bryngelson, Spencer H. ^{[1
,2
]}

机构：

[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA

[2] Georgia Inst Technol, Daniel Guggenheim Sch Aerosp Engn, Atlanta, GA 30332 USA

[3] CALTECH, Div Engn & Appl Sci, Pasadena, CA 91125 USA

[4] Brown Univ, Sch Engn, Providence, RI 02912 USA

来源：

COMPUTER PHYSICS COMMUNICATIONS | 2024年 / 302卷

基金：

美国国家科学基金会;

关键词：

Computational fluid dynamics; Heterogeneous computing; Multiphase flows; RIEMANN PROBLEM; RELAXATION; INTERFACES; FLUIDS;

D O I：

10.1016/j.cpc.2024.109238

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Multiphase compressible flows are often characterized by a broad range of space and time scales, entailing large grids and small time steps. Simulations of these flows on CPU-based clusters can thus take several wall-clock days. Offloading the compute kernels to GPUs appears attractive but is memory-bound for many finite-volume and-difference methods, damping speedups. Even when realized, GPU-based kernels lead to more intrusive communication and I/O times owing to lower computation costs. We present a strategy for GPU acceleration of multiphase compressible flow solvers that addresses these challenges and obtains large speedups at scale. We use OpenACC for directive-based offloading of all compute kernels while maintaining low-level control when needed. An established Fortran preprocessor and metaprogramming tool, Fypp, enables otherwise hidden compile-time optimizations. This strategy exposes compile-time optimizations and high memory reuse while retaining readable, maintainable, and compact code. Remote direct memory access realized via CUDA-aware MPI and GPUDirect reduces halo-exchange communication time. We implement this approach in the open-source solver MFC [1]. Metaprogramming results in an 8-times speedup of the most expensive kernels compared to a statically compiled program, reaching 46% of peak FLOPs on modern NVIDIA GPUs and high arithmetic intensity (about 10 FLOPs/byte). In representative simulations, a single NVIDIA A100 GPU is 7-times faster compared to an Intel Xeon Cascade Lake (6248) CPU die, or about 300-times faster compared to a single such CPU core. At the same time, near-ideal (97%) weak scaling is observed for at least 13824 GPUs on OLCF Summit. A strong scaling efficiency of 84% is retained for an 8-times increase in GPU count. Collective I/O, implemented via MPI3, helps ensure the negligible contribution of data transfers (< 1% of the wall time for a typical, large simulation). Large many-GPU simulations of compressible (solid-)liquid-gas flows demonstrate the practical utility of this strategy.

引用

页数：11

共 50 条

[41] Demo: Image Disguising for Scalable GPU-accelerated Confidential Deep Learning
Gu, Yuechun
Sharma, Sagar
Chen, Keke
PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 3679 - 3681
[42] GPU-Accelerated Solutions to Optimal Power Flow Problems
Rakai, Logan
Rosehart, William
2014 47TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2014, : 2511 - 2516
[43] Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication
Tan, Nigel
Luettgau, Jakob
Marquez, Jack
Terianishi, Keita
Morales, Nicolas
Bhowmick, Sanjukta
Cappello, Franck
Taufer, Michela
Nicolae, Bogdan
PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 665 - 674
[44] Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion
Li, Yanchen
Li, Jiachun
Sun, Kebin
Leng, Luziwei
Cheng, Ran
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 58 - 73
[45] Numerical Simulation of Liquid Sloshing in Lng Tank Using Gpu-Accelerated Mps Method
Chen X.
Wan D.
Lixue Xuebao/Chinese Journal of Theoretical and Applied Mechanics, 2019, 51 (03): : 714 - 729
[46] A GPU-Accelerated Harmonic Balance Method for Nonlinear Radio-Frequency Circuit Simulation
Wang, Zhengzhuo
Sha, Yanliang
Ouyang, Lingyun
Chen, Quan
Hu, Jianguo
Wang, Deming
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 764 - 764
[47] GPU-Accelerated Method of Moments by Example: Monostatic Scattering
Lezar, Evan
Davidson, David B.
IEEE ANTENNAS AND PROPAGATION MAGAZINE, 2010, 52 (06) : 120 - 135
[48] A GPU-accelerated solver for turbulent flow and scalar transport based on the Lattice Boltzmann method
Ren, Feng
Song, Baowei
Zhang, Ya
Hu, Haibao
COMPUTERS & FLUIDS, 2018, 173 : 29 - 36
[49] GPU-Accelerated Sparse LU Factorization for Power System Simulation
Gnanavignesh, R.
Shenoy, U. Jayachandra
Proceedings of 2019 IEEE PES Innovative Smart Grid Technologies Europe, ISGT-Europe 2019, 2019,
[50] GPU-Accelerated Field Simulation of HVAC Gas Insulated Lines
Hensel, Hendrik
Henkel, Marvin-Lucas
Haussmann, Norman
Joergens, Christoph
Stroka, Steven
Clemens, Markus
TWENTIETH BIENNIAL IEEE CONFERENCE ON ELECTROMAGNETIC FIELD COMPUTATION (IEEE CEFC 2022), 2022,

← 1 2 3 4 5 →