Workload Interference Prevention with Intelligent Routing and Flexible Job Placement on Dragonfly

被引:0
|
作者
Kang, Yao [1 ]
Wang, Xin [1 ]
Lan, Zhiling [1 ]
机构
[1] IIT, Chicago, IL 60616 USA
基金
美国国家科学基金会;
关键词
high performance computing; interconnect networking; parallel discrete event simulation;
D O I
10.1145/3573900.3591119
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dragonfly is an indispensable interconnect topology for exascale HPC systems. To link tens of thousands of compute nodes at a reasonable cost, Dragonfly shares network resources with the entire system such that network bandwidth is not exclusive to any single job. Since HPC systems are usually shared between multiple co-running workloads at the same time, network competition between co-existing workloads is inevitable. This network contention appears as workload interference, where a job's network communication can be severely delayed by other jobs. Recent studies show that, compared with the deployed adaptive routing algorithms, an intelligent routing solution based on reinforcement learning named Q-adaptive routing can reduce workload interference. In addition to improving routing efficiency, job placement is a simple yet effective method to mitigate workload interference. In this study, we leverage the well-known parallel discrete event simulation toolkit, SST, to investigate workload interference on Dragonfly with three contributions. We first develop an automatic module that serves as the bridge between SST and HPC job scheduler for automatic simulation configuration and automated simulation launching. Next, we propose a flexible job placement strategy that can mitigate workload interference based on workload communication characteristics. Finally, we extensively examine the workload interference under various job placement and routing configurations.
引用
收藏
页码:23 / 33
页数:11
相关论文
共 50 条
  • [31] A green intelligent routing algorithm supporting flexible QoS for many-to-many multicast
    Wang, Xingwei
    Zhang, Jinhong
    Huang, Min
    Yang, Shengxiang
    COMPUTER NETWORKS, 2017, 126 : 229 - 245
  • [32] Corrected Aggregate Workload approach on order release by considering job's routing position induced variable indirect load
    Yuan, Mingze
    Ma, Lin
    Qu, Ting
    Thuerer, Matthias
    Huang, George Q.
    INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2024, 124 (11) : 2992 - 3011
  • [33] Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems
    Qiao, Peixin
    Wang, Xin
    Yang, Xu
    Fan, Yuping
    Lan, Zhiling
    47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP '18), 2018,
  • [34] Crosstalk-Aware Routing Spectrum Assignment and WSS Placement in Flexible Grid Optical Networks
    Manousakis, Konstantinos
    Ellinas, Georgios
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2017, 35 (09) : 1477 - 1489
  • [35] Crosstalk-aware routing spectrum assignment and WSS placement in flexible grid optical networks
    KIOS Research Center for Intelligent Systems and Networks, Department of Electrical and Computer Engineering, University of Cyprus, Nicosia
    1678, Cyprus
    J Lightwave Technol, 9 (1477-1489):
  • [36] Intelligent job shop scheduling based on MAS and integrated routing wasp algorithm and scheduling wasp algorithm
    Cao, Yan
    Yang, Yanli
    Wang, Huamin
    Yang, Lina
    Journal of Software, 2009, 4 (05) : 487 - 494
  • [37] Intelligent factory many-objective distributed flexible job shop collaborative scheduling method
    Sang, Yanwei
    Tan, Jianping
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 164
  • [38] Intelligent factory many-objective distributed flexible job shop collaborative scheduling method
    Sang, Yanwei
    Tan, Jianping
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 164
  • [39] Economic due-date setting in job-shops based on routing and workload dependent flow time distribution functions
    van Ooijen, HPG
    Bertrand, JWM
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2001, 74 (1-3) : 261 - 268
  • [40] Pickup and delivery routing with hub transshipment across flexible time periods for improving dual objectives on workload and waiting time
    Chou, Yon-Chun
    Chen, Yao-Hung
    Chen, Hui-Min
    TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW, 2014, 61 : 98 - 114