A Heterogeneous MPI plus PPL Task Scheduling Approach for Asynchronous Many-Task Runtime Systems

被引:1
|
作者
Holmen, John K. [1 ]
Sahasrabudhe, Damodar [1 ]
Berzins, Martin [1 ]
机构
[1] Univ Utah, SCI Inst, Salt Lake City, UT 84112 USA
关键词
Asynchronous Many-Task Runtime System; Performance Portability; Parallelism and Concurrency; Portability; Software Engineering; PERFORMANCE;
D O I
10.1145/3437359.3465581
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Asynchronous many-task runtime systems and MPI+X hybrid parallelism approaches have shown promise for helping manage the increasing complexity of nodes in current and emerging high performance computing (HPC) systems, including those for exascale. The increasing architectural diversity of these systems, however, poses challenges for runtimes supporting more homogeneous HPC systems. Performance portability layers (PPL) have shown promise for helping manage this diversity. This paper describes a heterogeneous MPI+PPL task scheduling approach for combining these promising solutions with additional consideration for parallel third party libraries facing similar challenges to help prepare such a runtime for the diverse heterogeneous systems accompanying exascale computing. This approach is demonstrated using a heterogeneous MPI+Kokkos task scheduler and the accompanying portable abstractions [16] implemented in the Uintah Computational Framework, an asynchronous many-task runtime system, with additional consideration for hypre, a parallel third party library. Results are shown for two challenging problems executing workloads representative of typical Uintah applications. These results show performance improvements up to 4.4x when using this scheduler and the accompanying portable abstractions [16] to port a previously MPI-Only problem to Kokkos::OpenMP and Kokkos::CUDA to improve complex heterogeneous node use. Good strong-scaling to 1,024 NVIDIA V100 GPUs and 512 IBM POWER9 processor are also shown using MPI+Kokkos::OpenMP+Kokkos::CUDA at scale.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Towards Distributed Software Resilience in Asynchronous Many-Task Programming Models
    Gupta, Nikunj
    Mayo, Jackson R.
    Lemoine, Adrian S.
    Kaiser, Hartmut
    PROCEEDINGS OF 2020 IEEE/ACM 10TH WORKSHOP ON FAULT TOLERANCE FOR HPC AT EXTREME SCALE (FTXS 2020), 2020, : 11 - 20
  • [22] Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems
    Wagle, Bibek
    Monil, Mohammad Alaul Haque
    Huck, Kevin
    Malony, Allen D.
    Serio, Adrian
    Kaiser, Hartmut
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [23] Disk Cache-Aware Task Scheduling For Data-Intensive and Many-Task Workflow
    Tanaka, Masahiro
    Tatebe, Osamu
    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 167 - 175
  • [24] Scheduling Many-Task Applications on Multi-clouds and Hybrid Clouds
    Mithila, Shifat P.
    Franz, Peter
    Baumgartner, Gerald
    ASYNCHRONOUS MANY-TASK SYSTEMS AND APPLICATIONS, WAMTA 2023, 2023, 13861 : 65 - 78
  • [25] Task scheduling for heterogeneous systems using an incremental approach
    Minhaj Ahmad Khan
    The Journal of Supercomputing, 2017, 73 : 1905 - 1928
  • [26] Efficient task scheduling for runtime reconfigurable systems
    Fazlali, Mahmood
    Sabeghi, Mojtaba
    Zakerolhosseini, Ali
    Bertels, Koen
    JOURNAL OF SYSTEMS ARCHITECTURE, 2010, 56 (11) : 623 - 632
  • [27] Task scheduling for heterogeneous systems using an incremental approach
    Khan, Minhaj Ahmad
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (05): : 1905 - 1928
  • [28] New approach to allocation planning of many-task workflows on clouds
    Gerhards, Michael
    Sander, Volker
    Zivkovic, Miroslav
    Belloum, Adam
    Bubak, Marian
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (02):
  • [29] Evaluating Dynamic Task Scheduling in a Task-Based Runtime System for Heterogeneous Architectures
    Becker, Thomas
    Karl, Wolfgang
    Schuele, Tobias
    ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2019, 2019, 11479 : 142 - 155
  • [30] Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters
    Reitz, Lukas
    Hardenbicker, Kai
    Werner, Tobias
    Fohry, Claudia
    PARALLEL COMPUTING, 2023, 116