Automatic Code Generation for High-performance Discontinuous Galerkin Methods on Modern Architectures

被引:12
|
作者
Kempf, Dominic [1 ]
Hess, Rene [1 ]
Muthing, Steffen [1 ]
Bastian, Peter [1 ]
机构
[1] Heidelberg Univ, Neuenheimer Feld 205, D-69120 Heidelberg, Germany
来源
关键词
Code generation; Galerkin methods; HIGH-ORDER; PARALLEL; PRECONDITIONERS; IMPLEMENTATION; MANIPULATION; INTERFACE; VECTOR; FLOW;
D O I
10.1145/3424144
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
SIMD vectorization has lately become a key challenge in high-performance computing. However, handwritten explicitly vectorized code often poses a threat to the software's sustainability. In this publication, we solve this sustainability and performance portability issue by enriching the simulation framework dune-pdelab with a code generation approach. The approach is based on the well-known domain-specific language UFL but combines it with loopy, a more powerful intermediate representation for the computational kernel. Given this flexible tool, we present and implement a new class of vectorization strategies for the assembly of Discontinuous Galerkin methods on hexahedral meshes exploiting the finite element's tensor product structure. The performance-optimal variant from this class is chosen by the code generator through an auto-tuning approach. The implementation is done within the open source PDE software framework Dune and the discretization module dune-pdelab. The strength of the proposed approach is illustrated with performance measurements for DG schemes for a scalar diffusion reaction equation and the Stokes equation. In our measurements, we utilize both the AVX2 and the AVX512 instruction set, achieving 30% to 40% of the machine's theoretical peak performance for one matrix-free application of the operator.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] AUTOMATED CODE GENERATION FOR DISCONTINUOUS GALERKIN METHODS
    Olgaard, Kristian B.
    Logg, Anders
    Wells, Garth N.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2008, 31 (02): : 849 - 864
  • [2] Automatic Code Generation for High-Performance Graph Algorithms
    Peng, Zhen
    Ashraf, Rizwan A.
    Guo, Luanzheng
    Tian, Ruiqin
    Kestor, Gokcen
    2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 14 - 26
  • [3] Automatic code generation and tuning for stencil kernels on modern shared memory architectures
    Christen, Matthias
    Schenk, Olaf
    Burkhart, Helmar
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2011, 26 (3-4): : 205 - 210
  • [4] High-performance architectures for elementary function generation
    Cao, J
    Wei, BWY
    Cheng, J
    ARITH-15 2001: 15TH SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2001, : 136 - 144
  • [5] Automatic Generation of OpenCL Code for ARM Architectures
    Afonso, Sergio
    Acosta, Alejandro
    Almeida, Francisco
    EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 96 - 107
  • [6] Branchless Code Generation for Modern Processor Architectures
    Angelou, Alexandros
    Dadaliaris, Antonios
    Dimitriou, Georgios
    Dossis, Michael
    25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 300 - 305
  • [7] Performance of discontinuous Galerkin methods for elliptic PDEs
    Castillo, P
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2002, 24 (02): : 524 - 547
  • [8] Efficiency of high-performance discontinuous Galerkin spectral element methods for under-resolved turbulent incompressible flows
    Fehn, Niklas
    Wall, Wolfgang A.
    Kronbichler, Martin
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2018, 88 (01) : 32 - 54
  • [9] Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures
    Kronbichler, Martin
    Kormann, Katharina
    Pasichnyk, Igor
    Allalen, Momme
    HIGH PERFORMANCE COMPUTING (ISC HIGH PERFORMANCE 2017), 2017, 10266 : 237 - 255
  • [10] AUTOMATIC SYMBOLIC COMPUTATION FOR DISCONTINUOUS GALERKIN FINITE ELEMENT METHODS
    Houston, Paul
    Sime, Nathan
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2018, 40 (03): : C327 - C357