Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources

被引:1
|
作者
Muskinja, Miha [1 ]
Calafiura, Paolo [1 ]
Leggett, Charles [1 ]
Shapoval, Illya [1 ]
Tsulaia, Vakho [1 ]
机构
[1] Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
关键词
D O I
10.1051/epjconf/202024505042
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The ATLAS experiment has successfully integrated HighPerformance Computing resources (HPCs) in its production system. Unlike the current generation of HPC systems, and the LHC computing grid, the next generation of supercomputers is expected to be extremely heterogeneous in nature: different systems will have radically different architectures, and most of them will provide partitions optimized for different kinds of workloads. In this work we explore the applicability of concepts and tools realized in Ray (the high-performance distributed execution framework targeting large-scale machine learning applications) to ATLAS event throughput optimization on heterogeneous distributed resources, ranging from traditional grid clusters to Exascale computers. We present a prototype of Raythena, a Ray-based implementation of the ATLAS Event Service (AES), a fine-grained event processing workflow aimed at improving the efficiency of ATLAS workflows on opportunistic resources, specifically HPCs. The AES is implemented as an event processing task farm that distributes packets of events to several worker processes running on multiple nodes. Each worker in the task farm runs an event-processing application (Athena) as a daemon. The whole system is orchestrated by Ray, which assigns work in a distributed, possibly heterogeneous, environment. For all its flexibility, the AES implementation is currently comprised of multiple separate layers that communicate through ad-hoc command-line and filebased interfaces. The goal of Raythena is to integrate these layers through a feature-rich, efficient application framework. Besides increasing usability and robustness, a vertically integrated scheduler will enable us to explore advanced concepts such as dynamically shaping of workflows to exploit currently available resources, particularly on heterogeneous systems.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Towards a distributed heterogeneous task scheduler for the ATLAS offline software framework
    Calafiura, Paolo
    Esseiva, Julien
    Ju, Xiangyang
    Leggett, Charles
    Stanislaus, Beojan
    Tsulaia, Vakho
    26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS, CHEP 2023, 2024, 295
  • [2] Job scheduler for streaming applications in heterogeneous distributed processing systems
    Al-Sinayyid, Ali
    Zhu, Michelle
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (12): : 9609 - 9628
  • [3] Job scheduler for streaming applications in heterogeneous distributed processing systems
    Ali Al-Sinayyid
    Michelle Zhu
    The Journal of Supercomputing, 2020, 76 : 9609 - 9628
  • [4] Self Adaptive Hadoop Scheduler for Heterogeneous Resources
    Elkholy, Amr M.
    Sallam, Elsayed A. H.
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 427 - 432
  • [5] Research on the realization of distributed and heterogeneous information resources integrated system
    Qi, Hui-Ying
    Wang, Xin
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2010, 42 (11): : 1838 - 1841
  • [6] SHRED: a CPU scheduler for heterogeneous applications
    Moonian, O
    Coulson, G
    Embedded Processors for Multimedia and Communications II, 2005, 5683 : 132 - 143
  • [7] JarvSis: a distributed scheduler for IoT applications
    M. De Benedetti
    F. Messina
    G. Pappalardo
    C. Santoro
    Cluster Computing, 2017, 20 : 1775 - 1790
  • [8] JarvSis: a distributed scheduler for IoT applications
    De Benedetti, M.
    Messina, F.
    Pappalardo, G.
    Santoro, C.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (02): : 1775 - 1790
  • [9] A two-stage scheduler of distributed energy resources
    Borghetti, Alberto
    Bosetti, Mauro
    Grillo, Samuele
    Morini, Andrea
    Paolone, Mario
    Silvestro, Federico
    2007 IEEE LAUSANNE POWERTECH, VOLS 1-5, 2007, : 2168 - +
  • [10] Indexing Distributed and Heterogeneous Resources
    Chromiak, Michal
    Stencel, Krzysztof
    Subieta, Kazimierz
    U- AND E-SERVICE, SCIENCE AND TECHNOLOGY, 2010, 124 : 214 - +