A Fine-grained Asynchronous Bulk Synchronous parallelism model for PGAS applications

被引:3
|
作者
Paul, Sri Raj [1 ]
Hayashi, Akihiro [2 ]
Chen, Kun [6 ]
Elmougy, Youssef [3 ]
Sarkar, Vivek [4 ,5 ]
机构
[1] Intel Corp, Austin, TX 78746 USA
[2] Georgia Inst Technol, Atlanta, GA USA
[3] Georgia Inst Technol, Comp Sci, Atlanta, GA USA
[4] Georgia Inst Technol, Coll Comp, Sch Comp Sci, Atlanta, GA USA
[5] Georgia Inst Technol, Coll Comp, Telecommun, Atlanta, GA USA
[6] Meta, Menlo Pk, CA USA
关键词
Actors; Communication aggregation; Conveyors; Bale; PGAS; OpenSHMEM; Selectors; Irregular applications; Large scale graph analytics;
D O I
10.1016/j.jocs.2023.102014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Partitioned Global Address Space (PGAS) model is well suited for executing irregular applications on cluster-based systems, due to its efficient support for short, one-sided messages. Separately, the actor model has been gaining popularity as a productive asynchronous message-passing approach for distributed objects in enterprise and cloud computing platforms, typically implemented in languages such as Erlang, Scala or Rust. To the best of our knowledge, there has been no past work on using the actor model to deliver both productivity and scalability to irregular PGAS applications with large number of small messages.In this paper, we introduce a new programming system for PGAS applications, in which point-to-point remote operations can be expressed as fine-grained asynchronous actor messages. In our approach, the programmer does not need to worry about programming complexities related to message aggregation and termination detection. Our approach can be viewed as extending the classical Bulk Synchronous Parallelism model with fine-grained asynchronous communications within a phase or superstep. We believe that our approach offers a desirable point in the productivity-performance space for PGAS applications, with more scalable performance and higher productivity relative to past approaches. Specifically, for seven irregular mini-applications from the Bale Kernels and three graph kernels executed using 2048 cores in the NERSC Cori system, our approach shows geometric mean performance improvements of >= symbolscript relative to standard PGAS versions (UPC and OpenSHMEM) while maintaining comparable productivity to those versions.This is an extended version of the conference paper "A Productive and Scalable Actor-Based Programming System for PGAS Applications"(Paul et al., 2022)[1] from ICCS 2022.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Evaluation of Fine-grained Parallelism in AUTOSAR Applications
    Stegmeier, Alexander
    Kehr, Sebastian
    George, Dave
    Bradatsch, Christian
    Panic, Milos
    Bodekker, Bert
    Ungerer, Theo
    [J]. INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS 2017), 2017, : 121 - 128
  • [2] FINE-GRAINED PARALLELISM IN ELLIE
    ANDERSEN, B
    [J]. JOURNAL OF OBJECT-ORIENTED PROGRAMMING, 1992, 5 (03): : 55 - 61
  • [3] Fine-grained parallelism in computational mathematics
    Bandman, OL
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2001, 27 (04) : 170 - 182
  • [4] Fine-Grained Parallelism in Computational Mathematics
    O. L. Bandman
    [J]. Programming and Computer Software, 2001, 27 : 170 - 182
  • [5] A MATCHING APPROACH TO UTILIZING FINE-GRAINED PARALLELISM
    GUPTA, R
    SOFFA, ML
    [J]. PROCEEDINGS OF THE TWENTY-FIRST, ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOLS 1-4: ARCHITECTURE TRACK, SOFTWARE TRACK, DECISION SUPPORT AND KNOWLEDGE BASED SYSTEMS TRACK, APPLICATIONS TRACK, 1988, : 148 - 156
  • [6] Exploiting Fine-Grained Parallelism on Cell Processors
    Hoffmann, Ralf
    Prell, Andreas
    Rauber, Thomas
    [J]. EURO-PAR 2010 - PARALLEL PROCESSING, PART II, 2010, 6272 : 175 - 186
  • [7] Graph Analytics Through Fine-Grained Parallelism
    Shang, Zechao
    Li, Feifei
    Yu, Jeffrey Xu
    Zhang, Zhiwei
    Cheng, Hong
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 463 - 478
  • [8] Implementation of Algorithms with a Fine-Grained Parallelism on GPUs
    Kalgin, K. V.
    [J]. NUMERICAL ANALYSIS AND APPLICATIONS, 2011, 4 (01) : 46 - 55
  • [9] BEHAVIOR OF FINE-GRAINED BULK SOLIDS
    MOLERUS, O
    [J]. CHEMIE INGENIEUR TECHNIK, 1993, 65 (06) : 710 - 718
  • [10] Accelerating RSA with Fine-Grained Parallelism Using GPU
    Yang, Yang
    Guan, Zhi
    Sun, Huiping
    Chen, Zhong
    [J]. INFORMATION SECURITY PRACTICE AND EXPERIENCE, ISPEC 2015, 2015, 9065 : 454 - 468