Parametric GPU Code Generation for Affine Loop Programs

被引:3
|
作者
Konstantinidis, Athanasios [1 ]
Kelly, Paul H. J. [1 ]
Ramanujam, J. [2 ]
Sadayappan, P. [3 ]
机构
[1] London Imperial Coll, London, England
[2] Louisiana State Univ, Baton Rouge, LA USA
[3] Ohio State Univ, Columbus, OH 43210 USA
基金
美国国家科学基金会; 英国工程与自然科学研究理事会;
关键词
EFFICIENT SOLUTIONS; SCHEDULING PROBLEM;
D O I
10.1007/978-3-319-09967-5_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Partitioning a parallel computation into finite-sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources that has a direct yet hard to estimate impact on performance. This paper develops the first compilation scheme for generating parametrically tiled codes for affine loop programs on GPUs, which facilitates run-time exploration of partitioning parameters as a fast and portable way of finding the ones that yield maximum performance. Our approach is based on a parametric tiling scheme for producing wavefronts of parallel rectangular partitions of parametric size and a novel runtime system that manages wavefront execution and local memory usage dynamically through an inspector-executor mechanism. An experimental evaluation demonstrates the effectiveness of our approach for wavefront as well as rectangularly-parallel partitionings.
引用
收藏
页码:136 / 151
页数:16
相关论文
共 50 条
  • [1] Automatic C-to-CUDA Code Generation for Affine Programs
    Baskaran, Muthu Manikandan
    Ramanujam, J.
    Sadayappan, P.
    [J]. COMPILER CONSTRUCTION, PROCEEDINGS, 2010, 6011 : 244 - +
  • [2] A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs
    Xu, Jinchen
    Song, Guanghui
    Zhou, Bei
    Li, Fei
    Hao, Jiangwei
    Zhao, Jie
    [J]. PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 55 - 67
  • [3] Automated Generation of Polyhedral Process Networks from Affine Nested-Loop Programs with Dynamic Loop Bounds
    Nadezhkin, Dmitry
    Nikolov, Hristo
    Stefanov, Todor
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13
  • [4] A code generation algorithm for affine partitioning framework
    Liao, SW
    Du, ZH
    Wu, GS
    Lueh, GY
    [J]. 11th International Conference on Parallel and Distributed Systems Workshops, Vol II, Proceedings,, 2005, : 17 - 21
  • [5] AFFINE-BY-STATEMENT SCHEDULING OF UNIFORM AND AFFINE LOOP NESTS OVER PARAMETRIC DOMAINS
    DARTE, A
    ROBERT, Y
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 29 (01) : 43 - 59
  • [6] GPU Code Generation of Cardiac Electrophysiology Simulation with MLIR
    Jost, Tiago Trevisan
    Thangamani, Arun
    Colin, Raphael
    Loechner, Vincent
    Genaud, Stephane
    Bramas, Berenger
    [J]. EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 549 - 563
  • [7] Accelerating The Virtual Brain with code generation and GPU computing
    M Marmaduke Woodman
    Viktor K Jirsa
    [J]. BMC Neuroscience, 14 (Suppl 1)
  • [8] Translating imperative affine nested loop programs into process networks
    Deprettere, EF
    Rijpkema, E
    Kienhuis, B
    [J]. EMBEDDED PROCESSOR DESIGN CHALLENGES: SYSTEMS, ARCHITECTURES, MODELLING, AND SIMULATION - SAMOS, 2002, 2268 : 89 - 111
  • [9] Communication code generation in systems of affine recurrence equations
    Reffay, C
    Perrin, GR
    [J]. INTEGRATION-THE VLSI JOURNAL, 1995, 20 (01) : 63 - 83
  • [10] Scalable GPU Communication with Code Generation on Stencil Applications
    Tozatti Risso, Joao Victor
    Bauer, Martin
    de Carvalho, Paulo Roberto, Jr.
    Ruede, Ulrich
    Weingaertner, Daniel
    [J]. 2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 88 - 95