Parametric GPU Code Generation for Affine Loop Programs

被引：3

作者：

Konstantinidis, Athanasios ^{[1
]}

Kelly, Paul H. J. ^{[1
]}

Ramanujam, J. ^{[2
]}

Sadayappan, P. ^{[3
]}

机构：

[1] London Imperial Coll, London, England

[2] Louisiana State Univ, Baton Rouge, LA USA

[3] Ohio State Univ, Columbus, OH 43210 USA

来源：

LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2013 | 2014年 / 8664卷

基金：

美国国家科学基金会; 英国工程与自然科学研究理事会;

关键词：

EFFICIENT SOLUTIONS; SCHEDULING PROBLEM;

D O I：

10.1007/978-3-319-09967-5_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Partitioning a parallel computation into finite-sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources that has a direct yet hard to estimate impact on performance. This paper develops the first compilation scheme for generating parametrically tiled codes for affine loop programs on GPUs, which facilitates run-time exploration of partitioning parameters as a fast and portable way of finding the ones that yield maximum performance. Our approach is based on a parametric tiling scheme for producing wavefronts of parallel rectangular partitions of parametric size and a novel runtime system that manages wavefront execution and local memory usage dynamically through an inspector-executor mechanism. An experimental evaluation demonstrates the effectiveness of our approach for wavefront as well as rectangularly-parallel partitionings.

引用

页码：136 / 151

页数：16

共 50 条

[1] Automatic C-to-CUDA Code Generation for Affine Programs
Baskaran, Muthu Manikandan
Ramanujam, J.
Sadayappan, P.
[J]. COMPILER CONSTRUCTION, PROCEEDINGS, 2010, 6011 : 244 - +
[2] A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs
Xu, Jinchen
Song, Guanghui
Zhou, Bei
Li, Fei
Hao, Jiangwei
Zhao, Jie
[J]. PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 55 - 67
[3] Automated Generation of Polyhedral Process Networks from Affine Nested-Loop Programs with Dynamic Loop Bounds
Nadezhkin, Dmitry
Nikolov, Hristo
Stefanov, Todor
[J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13
[4] A code generation algorithm for affine partitioning framework
Liao, SW
Du, ZH
Wu, GS
Lueh, GY
[J]. 11th International Conference on Parallel and Distributed Systems Workshops, Vol II, Proceedings,, 2005, : 17 - 21
[5] AFFINE-BY-STATEMENT SCHEDULING OF UNIFORM AND AFFINE LOOP NESTS OVER PARAMETRIC DOMAINS
DARTE, A
ROBERT, Y
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 29 (01) : 43 - 59
[6] GPU Code Generation of Cardiac Electrophysiology Simulation with MLIR
Jost, Tiago Trevisan
Thangamani, Arun
Colin, Raphael
Loechner, Vincent
Genaud, Stephane
Bramas, Berenger
[J]. EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 549 - 563
[7] Accelerating The Virtual Brain with code generation and GPU computing
M Marmaduke Woodman
Viktor K Jirsa
[J]. BMC Neuroscience, 14 (Suppl 1)
[8] Translating imperative affine nested loop programs into process networks
Deprettere, EF
Rijpkema, E
Kienhuis, B
[J]. EMBEDDED PROCESSOR DESIGN CHALLENGES: SYSTEMS, ARCHITECTURES, MODELLING, AND SIMULATION - SAMOS, 2002, 2268 : 89 - 111
[9] Communication code generation in systems of affine recurrence equations
Reffay, C
Perrin, GR
[J]. INTEGRATION-THE VLSI JOURNAL, 1995, 20 (01) : 63 - 83
[10] Scalable GPU Communication with Code Generation on Stencil Applications
Tozatti Risso, Joao Victor
Bauer, Martin
de Carvalho, Paulo Roberto, Jr.
Ruede, Ulrich
Weingaertner, Daniel
[J]. 2019 31ST INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2019), 2019, : 88 - 95

← 1 2 3 4 5 →