Large-Scale Experiment for Topology-Aware Resource Management

被引:0
|
作者
Georgiou, Yiannis [1 ]
Mercier, Guillaume [2 ]
Villiermet, Adele [3 ]
机构
[1] Atos Bull, Grenoble, France
[2] Bordeaux INP, Talence, France
[3] Inria Bordeaux Sud Ouest, Talence, France
来源
EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS | 2018年 / 10659卷
关键词
Resource management; Job allocation; Topology-aware placement; Scheduling; SLURM; PLACEMENT;
D O I
10.1007/978-3-319-75178-8_15
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A Resource and Job Management System (RJMS) is a crucial system software part of the HPC stack. It is responsible for efficiently delivering computing power to applications in supercomputing environments and its main intelligence relies on resource selection techniques to find the most adapted resources to schedule the users' jobs. In [8], we introduced a new topology-aware resource selection algorithm to determine the best choice among the available nodes of the platform based on their position in the network and on application behaviour (expressed as a communication matrix). We did integrate this algorithm as a plugin in SLURM and validated it with several optimization schemes by making comparisons with the default SLURM algorithm. This paper presents further experiments with regard to this selection process.
引用
收藏
页码:179 / 186
页数:8
相关论文
共 50 条
  • [41] RAQNet: A topology-aware overlay network
    Mirrezaei, Seyed Iman
    Shahparian, Javad
    Ghodsi, Mohammad
    INTER-DOMAIN MANAGEMENT, PROCEEDINGS, 2007, 4543 : 13 - +
  • [42] Topology-aware Generalization of Decentralized SGD
    Zhu, Tongtian
    He, Fengxiang
    Zhang, Lan
    Niu, Zhengyang
    Song, Mingli
    Tao, Dacheng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [43] Automatic Graph Topology-Aware Transformer
    Wang, Chao
    Zhao, Jiaxuan
    Li, Lingling
    Jiao, Licheng
    Liu, Fang
    Yang, Shuyuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [44] Topology-aware overlay path probing
    Tang, Chiping
    McKinley, Philip K.
    COMPUTER COMMUNICATIONS, 2007, 30 (09) : 1994 - 2009
  • [45] Topology-Aware Uncertainty for Image Segmentation
    Gupta, Saumya
    Zhang, Yikai
    Hu, Xiaoling
    Prasanna, Prateek
    Chen, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Topology-Aware Graph Pooling Networks
    Gao, Hongyang
    Liu, Yi
    Ji, Shuiwang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4512 - 4518
  • [47] A Topology-Aware Framework for Graph Traversals
    Meng, Jia
    Cao, Liang
    Yu, Huashan
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2017, 2017, 10393 : 165 - 179
  • [48] Autonomous Resource-Aware Scheduling of Large-Scale Media Workflows
    Desmet, Stein
    Volckaert, Bruno
    De Turck, Filip
    MECHANISMS FOR AUTONOMOUS MANAGEMENT OF NETWORKS AND SERVICES, 2010, 6155 : 50 - 64
  • [49] Performance-Aware Speculative Resource Oversubscription for Large-Scale Clusters
    Yang, Renyu
    Hu, Chunming
    Sun, Xiaoyang
    Garraghan, Peter
    Wo, Tianyu
    Wen, Zhenyu
    Peng, Hao
    Xu, Jie
    Li, Chao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (07) : 1499 - 1517
  • [50] Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters
    Subramoni, H.
    Kandalla, K.
    Jose, J.
    Tomko, K.
    Schulz, K.
    Pekurovsky, D.
    Panda, D. K.
    2014 43RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2014, : 231 - 240