Workload Interference Prevention with Intelligent Routing and Flexible Job Placement on Dragonfly

被引:0
|
作者
Kang, Yao [1 ]
Wang, Xin [1 ]
Lan, Zhiling [1 ]
机构
[1] IIT, Chicago, IL 60616 USA
基金
美国国家科学基金会;
关键词
high performance computing; interconnect networking; parallel discrete event simulation;
D O I
10.1145/3573900.3591119
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dragonfly is an indispensable interconnect topology for exascale HPC systems. To link tens of thousands of compute nodes at a reasonable cost, Dragonfly shares network resources with the entire system such that network bandwidth is not exclusive to any single job. Since HPC systems are usually shared between multiple co-running workloads at the same time, network competition between co-existing workloads is inevitable. This network contention appears as workload interference, where a job's network communication can be severely delayed by other jobs. Recent studies show that, compared with the deployed adaptive routing algorithms, an intelligent routing solution based on reinforcement learning named Q-adaptive routing can reduce workload interference. In addition to improving routing efficiency, job placement is a simple yet effective method to mitigate workload interference. In this study, we leverage the well-known parallel discrete event simulation toolkit, SST, to investigate workload interference on Dragonfly with three contributions. We first develop an automatic module that serves as the bridge between SST and HPC job scheduler for automatic simulation configuration and automated simulation launching. Next, we propose a flexible job placement strategy that can mitigate workload interference based on workload communication characteristics. Finally, we extensively examine the workload interference under various job placement and routing configurations.
引用
收藏
页码:23 / 33
页数:11
相关论文
共 50 条
  • [41] Multi-Objectives Optimization Model for Flexible Job Shop Scheduling Problem (FJS']JSSP) with Machines' Workload Balancing
    Shuib, Adibah
    Gran, Shirley Sinatra Anak
    PROCEEDING OF THE 25TH NATIONAL SYMPOSIUM ON MATHEMATICAL SCIENCES (SKSM25): MATHEMATICAL SCIENCES AS THE CORE OF INTELLECTUAL EXCELLENCE, 2018, 1974
  • [42] Bio-inspired scheduling for dynamic job shops with flexible routing and sequence-dependent setups
    Yu, Xuefeng
    Ram, Bala
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2006, 44 (22) : 4793 - 4813
  • [43] Reducing mean tardiness in a flexible job shop containing AGVs with optimized combinations of sequencing and routing rules
    Heger, Jens
    Voss, Thomas
    52ND CIRP CONFERENCE ON MANUFACTURING SYSTEMS (CMS), 2019, 81 : 1136 - 1141
  • [44] An Approach to Integrated Scheduling of Flexible Job-Shop Considering Conflict-Free Routing Problems
    Sun, Jiachen
    Xu, Zifeng
    Yan, Zhenhao
    Liu, Lilan
    Zhang, Yixiang
    SENSORS, 2023, 23 (09)
  • [45] Multiresource constrained flexible job shop intelligent scheduling considering fixture-pallet combinatorial optimization
    Liu M.
    Zhou Y.
    Wang S.
    Zhang C.
    Du S.
    Xi L.
    Zhongguo Kexue Jishu Kexue/Scientia Sinica Technologica, 2023, 53 (07): : 1114 - 1126
  • [46] Hierarchical scheduling for multi-constrained flexible job shop based on heuristic and intelligent optimization algorithms
    Han, B. A.
    Yang, J. J.
    2017 5TH INTERNATIONAL CONFERENCE ON ENTERPRISE SYSTEMS (ES), 2017, : 85 - 92
  • [47] Intelligent Scheduling in Flexible Job Shop Environments Based on Artificial Fish Swarm Algorithm with Estimation of Distribution
    Ge Hongwei
    Sun Liang
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 3230 - 3237
  • [48] A genetic algorithm-based approach for flexible job shop rescheduling problem with machine failure interference
    Liang, Zhongyuan
    Zhong, Peisi
    Zhang, Chao
    Yang, Wenlei
    Xiong, Wei
    Yang, Shihao
    Meng, Jing
    EKSPLOATACJA I NIEZAWODNOSC-MAINTENANCE AND RELIABILITY, 2023, 25 (04):
  • [49] Multi-agent Simulation for Flexible Job-Shop Scheduling Problem with Traffic-Aware Routing
    Sanogo, Kader
    Benhafssa, Abdelkader Mekhalef
    Sahnoun, M'hammed
    Bettayeb, Belgacem
    Bekrar, Abdelghani
    11TH INTERNATIONAL WORKSHOP ON SERVICE ORIENTED, HOLONIC AND MULTI-AGENT MANUFACTURING SYSTEMS FOR INDUSTRY OF THE FUTURE, SOHOMA 2021, 2022, 1034 : 573 - 583
  • [50] Process plan and part routing optimization in a dynamic flexible job shop scheduling environment: an optimization via simulation approach
    Geyik, Faruk
    Dosdogru, Ayse Tugba
    NEURAL COMPUTING & APPLICATIONS, 2013, 23 (06): : 1631 - 1641