Arbitrarily Parallelizable Code: A Model of Computation Evaluated on a Message-Passing Many-Core System

被引：0

作者：

Cook, Sebastien ^{[1
]}

Garcia, Paulo ^{[2
]}

机构：

[1] Carleton Univ, Ottawa, ON K1S 5B6, Canada

[2] CMKL Univ, Carnegie Mellon KMITL Thailand Program, Bangkok 10520, Thailand

来源：

COMPUTERS | 2022年 / 11卷 / 11期

关键词：

graph-based programming; intermediate representation; parallelization; PERFORMANCE;

D O I：

10.3390/computers11110164

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The number of processing elements per solution is growing. From embedded devices now employing (often heterogeneous) multi-core processors, across many-core scientific computing platforms, to distributed systems comprising thousands of interconnected processors, parallel programming of one form or another is now the norm. Understanding how to efficiently parallelize code, however, is still an open problem, and the difficulties are exacerbated across heterogeneous processing, and especially at run time, when it is sometimes desirable to change the parallelization strategy to meet non-functional requirements (e.g., load balancing and power consumption). In this article, we investigate the use of a programming model based on series-parallel partial orders: computations are expressed as directed graphs that expose parallelization opportunities and necessary sequencing by construction. This programming model is suitable as an intermediate representation for higher-level languages. We then describe a model of computation for such a programming model that maps such graphs into a stack-based structure more amenable to hardware processing. We describe the formal small-step semantics for this model of computation and use this formal description to show that the model can be arbitrarily parallelized, at compile and runtime, with correct execution guaranteed by design. We empirically support this claim and evaluate parallelization benefits using a prototype open-source compiler, targeting a message-passing many-core simulation. We empirically verify the correctness of arbitrary parallelization, supporting the validity of our formal semantics, analyze the distribution of operations within cores to understand the implementation impact of the paradigm, and assess execution time improvements when five micro-benchmarks are automatically and randomly parallelized across 2 x 2 and 4 x 4 multi-core configurations, resulting in execution time decrease by up to 95% in the best case.

引用

页数：20

共 20 条

[1] Message-passing model for highly concurrent computation
Martin, A.J.
Conference on Hypercube Concurrent Computers and Applications, 1988,
[2] Design of a message-passing multi-core system
Hu, Zhe-Kun
Chen, Jie
Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2013, 40 (08): : 102 - 109
[3] PIMP My Many-Core: Pipeline-Integrated Message Passing
Mische, Joerg
Frieb, Martin
Stegmeier, Alexander
Ungerer, Theo
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2021, 49 (04) : 487 - 505
[4] PIMP My Many-Core: Pipeline-Integrated Message Passing
Mische, Joerg
Frieb, Martin
Stegmeier, Alexander
Ungerer, Theo
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 : 199 - 211
[5] Message Passing-Aware Power Management on Many-Core Systems
Bartolini, Andrea
Hankendi, Can
Coskun, Ayse Kivilcim
Benini, Luca
JOURNAL OF LOW POWER ELECTRONICS, 2014, 10 (04) : 531 - 549
[6] PIMP My Many-Core: Pipeline-Integrated Message Passing
Jörg Mische
Martin Frieb
Alexander Stegmeier
Theo Ungerer
International Journal of Parallel Programming, 2021, 49 : 487 - 505
[7] Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures
Si, Min
Balaji, Pavan
Ishikawa, Yutaka
2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 697 - 700
[8] Low Overhead Message Passing for High Performance Many-Core Processors
Kumar, Sumeet S.
Djie, Mitzi Tjin A.
van Leuken, Rene
2013 FIRST INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2013, : 345 - 351
[9] Relaxed barrier synchronization for the BSP model of computation on message-passing architectures
Kim, JS
Ha, S
Jhon, CS
INFORMATION PROCESSING LETTERS, 1998, 66 (05) : 247 - 253
[10] Efficient task spawning for shared memory and message passing in many-core architectures
Zaib, Aurang
Wild, Thomas
Herkersdorf, Andreas
Heisswolf, Jan
Becker, Juergen
Weichslgartner, Andreas
Teich, Juergen
JOURNAL OF SYSTEMS ARCHITECTURE, 2017, 77 : 72 - 82

← 1 2 →