LIFT: A Functional Data-Parallel IR for High-Performance GPU Code Generation

被引:0
|
作者
Steuwer, Michel [1 ]
Remmelg, Toomas [1 ]
Dubach, Christophe [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Parallel patterns (e.g., map, reduce) have gained traction as an abstraction for targeting parallel accelerators and are a promising answer to the performance portability problem. However, compiling high-level programs into efficient low-level parallel code is challenging. Current approaches start from a high-level parallel IR and proceed to emit GPU code directly in one big step. Fixed strategies are used to optimize and map parallelism exploiting properties of a particular GPU generation leading to performance portability issues. We introduce the Lift IR, a new data-parallel IR which encodes OpenCL-specific constructs as functional patterns. Our prior work has shown that this functional nature simplifies the exploration of optimizations and mapping of parallelism from portable high-level programs using rewrite-rules. This paper describes how Lift IR programs are compiled into efficient OpenCL code. This is non-trivial as many performance sensitive details such as memory allocation, array accesses or synchronization are not explicitly represented in the Lift IR. We present techniques which overcome this challenge by exploiting the pattern's high-level semantics. Our evaluation shows that the Lift IR is flexible enough to express GPU programs with complex optimizations achieving performance on par with manually optimized code.
引用
收藏
页码:74 / 85
页数:12
相关论文
共 50 条
  • [1] CODE GENERATION FOR A DATA-PARALLEL SIMD LANGUAGE
    BREZANY, P
    SIPKOVA, V
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1992, 591 : 127 - 138
  • [2] Data-Parallel Hashing Techniques for GPU Architectures
    Lessley, Brenton
    Childs, Hank
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (01) : 237 - 250
  • [3] High Performance Stencil Code Generation with LIFT
    Hagedorn, Bastian
    Stoltzfus, Larisa
    Steuwer, Michel
    Gorlatch, Sergei
    Dubach, Christophe
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, : 100 - 112
  • [4] Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications
    Zhong, Ziming
    Rychkov, Vladimir
    Lastovetsky, Alexey
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 191 - 199
  • [5] Iterative Data-parallel Mark&Sweep on a GPU
    Veldema, Ronald
    Philippsen, Michael
    [J]. ACM SIGPLAN NOTICES, 2011, 46 (11) : 1 - 10
  • [6] Data-parallel code generation from synchronous dataflow specification of multimedia applications
    Kwon, Seongnam
    Lee, Choonseung
    Ha, Soonhoi
    [J]. 2007 IEEE/ACM/IFIP WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2007, : 91 - 96
  • [7] The development of the data-parallel GPU programming language CGiS
    Lucas, Philipp
    Fritz, Nicolas
    Wilhelm, Reinhard
    [J]. COMPUTATIONAL SCIENCE - ICCS 2006, PT 4, PROCEEDINGS, 2006, 3994 : 200 - 203
  • [8] Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations
    Rawat, Prashant Singh
    Vaidya, Miheer
    Sukumaran-Rajam, Aravind
    Ravishankar, Mahesh
    Grover, Vinod
    Rountev, Atanas
    Pouchet, Louis-Noel
    Sadayappan, P.
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 1902 - 1920
  • [9] Efficient Inspected Critical Sections in Data-Parallel GPU Codes
    Blass, Thorsten
    Philippsen, Michael
    Veldema, Ronald
    [J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2017, 2019, 11403 : 223 - 239
  • [10] Data-Parallel Implementation of Reconfigurable Digital Predistortion on a Mobile GPU
    Ghazi, Amanullah
    Boutellier, Jani
    Anttila, Lauri
    Juntti, Markku
    Valkama, Mikko
    [J]. 2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 186 - 191