Arbitrarily Parallelizable Code: A Model of Computation Evaluated on a Message-Passing Many-Core System

被引:0
|
作者
Cook, Sebastien [1 ]
Garcia, Paulo [2 ]
机构
[1] Carleton Univ, Ottawa, ON K1S 5B6, Canada
[2] CMKL Univ, Carnegie Mellon KMITL Thailand Program, Bangkok 10520, Thailand
关键词
graph-based programming; intermediate representation; parallelization; PERFORMANCE;
D O I
10.3390/computers11110164
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The number of processing elements per solution is growing. From embedded devices now employing (often heterogeneous) multi-core processors, across many-core scientific computing platforms, to distributed systems comprising thousands of interconnected processors, parallel programming of one form or another is now the norm. Understanding how to efficiently parallelize code, however, is still an open problem, and the difficulties are exacerbated across heterogeneous processing, and especially at run time, when it is sometimes desirable to change the parallelization strategy to meet non-functional requirements (e.g., load balancing and power consumption). In this article, we investigate the use of a programming model based on series-parallel partial orders: computations are expressed as directed graphs that expose parallelization opportunities and necessary sequencing by construction. This programming model is suitable as an intermediate representation for higher-level languages. We then describe a model of computation for such a programming model that maps such graphs into a stack-based structure more amenable to hardware processing. We describe the formal small-step semantics for this model of computation and use this formal description to show that the model can be arbitrarily parallelized, at compile and runtime, with correct execution guaranteed by design. We empirically support this claim and evaluate parallelization benefits using a prototype open-source compiler, targeting a message-passing many-core simulation. We empirically verify the correctness of arbitrary parallelization, supporting the validity of our formal semantics, analyze the distribution of operations within cores to understand the implementation impact of the paradigm, and assess execution time improvements when five micro-benchmarks are automatically and randomly parallelized across 2 x 2 and 4 x 4 multi-core configurations, resulting in execution time decrease by up to 95% in the best case.
引用
收藏
页数:20
相关论文
共 20 条
  • [1] Message-passing model for highly concurrent computation
    Martin, A.J.
    Conference on Hypercube Concurrent Computers and Applications, 1988,
  • [2] Design of a message-passing multi-core system
    Hu, Zhe-Kun
    Chen, Jie
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2013, 40 (08): : 102 - 109
  • [3] PIMP My Many-Core: Pipeline-Integrated Message Passing
    Mische, Joerg
    Frieb, Martin
    Stegmeier, Alexander
    Ungerer, Theo
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2021, 49 (04) : 487 - 505
  • [4] PIMP My Many-Core: Pipeline-Integrated Message Passing
    Mische, Joerg
    Frieb, Martin
    Stegmeier, Alexander
    Ungerer, Theo
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 : 199 - 211
  • [5] Message Passing-Aware Power Management on Many-Core Systems
    Bartolini, Andrea
    Hankendi, Can
    Coskun, Ayse Kivilcim
    Benini, Luca
    JOURNAL OF LOW POWER ELECTRONICS, 2014, 10 (04) : 531 - 549
  • [6] PIMP My Many-Core: Pipeline-Integrated Message Passing
    Jörg Mische
    Martin Frieb
    Alexander Stegmeier
    Theo Ungerer
    International Journal of Parallel Programming, 2021, 49 : 487 - 505
  • [7] Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures
    Si, Min
    Balaji, Pavan
    Ishikawa, Yutaka
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 697 - 700
  • [8] Low Overhead Message Passing for High Performance Many-Core Processors
    Kumar, Sumeet S.
    Djie, Mitzi Tjin A.
    van Leuken, Rene
    2013 FIRST INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2013, : 345 - 351
  • [9] Relaxed barrier synchronization for the BSP model of computation on message-passing architectures
    Kim, JS
    Ha, S
    Jhon, CS
    INFORMATION PROCESSING LETTERS, 1998, 66 (05) : 247 - 253
  • [10] Efficient task spawning for shared memory and message passing in many-core architectures
    Zaib, Aurang
    Wild, Thomas
    Herkersdorf, Andreas
    Heisswolf, Jan
    Becker, Juergen
    Weichslgartner, Andreas
    Teich, Juergen
    JOURNAL OF SYSTEMS ARCHITECTURE, 2017, 77 : 72 - 82