High Performance Stencil Code Generation with LIFT

被引:65
|
作者
Hagedorn, Bastian [1 ]
Stoltzfus, Larisa [2 ]
Steuwer, Michel [3 ]
Gorlatch, Sergei [1 ]
Dubach, Christophe [2 ]
机构
[1] Univ Munster, Munster, Germany
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] Univ Glasgow, Glasgow, Lanark, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Code Generation; Stencil; GPU Computing; Performance Portability; Lift;
D O I
10.1145/3168824
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Stencil computations are widely used from physical simulations to machine-learning. They are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing Units. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain Specific Languages (DSLs) have raised the programming abstraction and offer good performance. However, this places the burden on DSL implementers who have to write almost full-fledged parallelizing compilers and optimizers. LIFT has recently emerged as a promising approach to achieve performance portability and is based on a small set of reusable parallel primitives that DSL or library writers can build upon. LIFT'S key novelty is in its encoding of optimizations as a system of extensible rewrite rules which are used to explore the optimization space. However, LIFT has mostly focused on linear algebra operations and it remains to be seen whether this approach is applicable for other domains. This paper demonstrates how complex multidimensional stencil code and optimizations such as tiling are expressible using compositions of simple 1D LIFT primitives. By leveraging existing LIFT primitives and optimizations, we only require the addition of two primitives and one rewrite rule to do so. Our results show that this approach outperforms existing compiler approaches and hand-tuned codes.
引用
收藏
页码:100 / 112
页数:13
相关论文
共 50 条
  • [1] High Performance Stencil Code Algorithms for GPGPUs
    Schaefer, Andreas
    Fey, Dietmar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 2027 - 2036
  • [2] High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures
    Li, Pei
    Brunet, Elisabeth
    Namyst, Raymond
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1512 - 1518
  • [3] Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations
    Rawat, Prashant Singh
    Vaidya, Miheer
    Sukumaran-Rajam, Aravind
    Ravishankar, Mahesh
    Grover, Vinod
    Rountev, Atanas
    Pouchet, Louis-Noel
    Sadayappan, P.
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 1902 - 1920
  • [4] Enabling efficient stencil code generation in OpenACC
    Pereira, Alyson D.
    Rocha, Rodrigo C. O.
    Castro, Marcio
    Goes, Luis F. W.
    Dantas, Mario A. R.
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 2333 - 2337
  • [5] LIFT: A Functional Data-Parallel IR for High-Performance GPU Code Generation
    Steuwer, Michel
    Remmelg, Toomas
    Dubach, Christophe
    [J]. CGO'17: PROCEEDINGS OF THE 2017 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2017, : 74 - 85
  • [6] Understanding Stencil Code Performance On MultiCore Architectures
    Rahman, Shah M. Faizur
    Yi, Qing
    Qasem, Apan
    [J]. PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011), 2011,
  • [7] Scalable GPU Communication with Code Generation on Stencil Applications
    Tozatti Risso, Joao Victor
    Bauer, Martin
    de Carvalho, Paulo Roberto, Jr.
    Ruede, Ulrich
    Weingaertner, Daniel
    [J]. 2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 88 - 95
  • [8] A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation
    Hu, Yue
    Koppelman, David M.
    Brandt, Steven R.
    [J]. 2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), 2016, : 361 - 368
  • [9] Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks
    Pereira, Alyson D.
    Castro, Marcio
    Dantas, Mario A. R.
    Rocha, Rodrigo C. O.
    Goes, Luis F. W.
    [J]. 2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 719 - 726
  • [10] Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs
    Wen-Jing Ma
    Kan Gao
    Guo-Ping Long
    [J]. Journal of Computer Science and Technology, 2016, 31 : 1262 - 1274