Vectorization and Minimization of Memory Footprint for Linear High-Order Discontinuous Galerkin Schemes

被引：0

作者：

Gallard, Jean-Matthieu ^{[1
]}

Rannabauer, Leonhard ^{[1
]}

Reinarz, Anne ^{[1
]}

Bader, Michael ^{[1
]}

机构：

[1] Tech Univ Munich, Dept Informat, Munich, Germany

来源：

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020) | 2020年

关键词：

ExaHyPE; Code Generation; High-Order Discontinuous Galerkin; ADER; Hyperbolic PDE Systems; Vectorization; Array-of-Struct-of-Array;

D O I：

10.1109/IPDPSW50202.2020.00126

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a sequence of optimizations to the performance-critical compute kernels of the high-order discontinuous Galerkin solver of the hyperbolic PDE engine ExaHyPE - successively tackling bottlenecks due to SIMD operations, cache hierarchies and restrictions in the software design. Starting from a generic scalar implementation of the numerical scheme, our first optimized variant applies state-of-the-art optimization techniques by vectorizing loops, improving the data layout and using Loop-over-GEMM to perform tensor contractions via highly optimized matrix multiplication functions provided by the LIBXSMM library. We show that memory stalls due to a memory footprint exceeding our L2 cache size hindered the vectorization gains. We therefore introduce a new kernel that applies a sum factorization approach to reduce the kernel's memory footprint and improve its cache locality. With the L2 cache bottleneck removed, we were able to exploit additional vectorization opportunities, by introducing a hybrid Array-of-Structure-of-Array data layout that solves the data layout conflict between matrix multiplications kernels and the point-wise functions to implement PDE-specific terms. With this last kernel, evaluated in a benchmark simulation at high polynomial order, only 2% of the floating point operations are still performed using scalar instructions and 22.5% of the available performance is achieved.

引用

页码：711 / 720

页数：10

共 50 条

[41] Positivity-preserving high-order discontinuous Galerkin schemes for Ten-Moment Gaussian closure equations
Meena, Asha Kumari
Kumar, Harish
Chandrashekar, Praveen
JOURNAL OF COMPUTATIONAL PHYSICS, 2017, 339 : 370 - 395
[42] High-order discontinuous Galerkin schemes on general 2D manifolds applied to the shallow water equations
Bernard, P-E.
Remacle, J. -F.
Comblen, R.
Legat, V.
Hillewaert, K.
JOURNAL OF COMPUTATIONAL PHYSICS, 2009, 228 (17) : 6514 - 6535
[43] Invariants Preserving Time-Implicit Local Discontinuous Galerkin Schemes for High-Order Nonlinear Wave Equations
Zheng, Wei
Xu, Yan
COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION, 2024,
[44] On high-order accurate weighted essentially non-oscillatory and discontinuous Galerkin schemes for compressible turbulence simulations
Shu, Chi-Wang
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2013, 371 (1982):
[45] Shock detection and limiting strategies for high order discontinuous Galerkin schemes
Altmann, C.
Thube, A.
Gassner, G.
Loercher, F.
Munz, C. -D.
SHOCK WAVES, VOL 2, PROCEEDINGS, 2009, : 1053 - 1058
[46] Solution limiters and flux limiters for high order discontinuous Galerkin schemes
Petrovskaya, Natalia
NUMERICAL METHODS AND APPLICATIONS, 2007, 4310 : 668 - 676
[47] High-order discontinuous element-based schemes for the inviscid shallow water equations: Spectral multidomain penalty and discontinuous Galerkin methods
Escobar-Vargas, J. A.
Diamessis, P. J.
Giraldo, F. X.
APPLIED MATHEMATICS AND COMPUTATION, 2012, 218 (09) : 4825 - 4848
[48] Hybrid schemes with high-order multioperators for computing discontinuous solutions
A. I. Tolstykh
Computational Mathematics and Mathematical Physics, 2013, 53 : 1303 - 1322
[49] Hybrid Schemes with High-Order Multioperators for Computing Discontinuous Solutions
Tolstykh, A. I.
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2013, 53 (09) : 1303 - 1322
[50] High-order discontinuous Galerkin discretizations for computational aeroacoustics in complex domains
Toulopoulos, I
Ekaterinaris, JA
AIAA JOURNAL, 2006, 44 (03) : 502 - 511

← 1 2 3 4 5 →