Scheduling Jobs across Geo-Distributed Datacenters with Max-Min Fairness

被引:30
|
作者
Chen, Li [1 ]
Liu, Shuhao [1 ]
Li, Baochun [1 ]
Li, Bo [2 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S3G4, Canada
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
Geo-distributed datacenter networks; wide-area big data analytics; scheduling; fairness;
D O I
10.1109/TNSE.2018.2795580
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
It has become routine for large volumes of data to be generated, stored, and processed across geographically distributed datacenters. To run a single data analytic job on such geo-distributed data, recent research proposed to distribute its tasks across datacenters, considering both data locality and network bandwidth across datacenters. Yet, it remains an open problem in the more general case, where multiple analytic jobs need to fairly share the resources at these geo-distributed datacenters. In this paper, we focus on the problem of assigning tasks belonging to multiple jobs across datacenters, with the specific objective of achieving max-min fairness across jobs sharing these datacenters, in terms of their job completion times. We formulate this problem as a lexicographical minimization problem, which is challenging to solve in practice due to its inherent multi-objective and discrete nature. To address these challenges, we iteratively solve its single-objective subproblems, which can be transformed to equivalent linear programming (LP) problems to be efficiently solved, thanks to their favorable properties. As a highlight of this paper, we have designed and implemented our proposed solution as a fair job scheduler based on Apache Spark, a modern data processing framework. With extensive evaluations of our real-world implementation on Amazon EC2 and large-scale simulations, we have shown convincing evidence that max-min fairness has been achieved and the worst job completion time has been significantly improved using our new job scheduler.
引用
收藏
页码:488 / 500
页数:13
相关论文
共 50 条
  • [1] Scheduling Jobs across Geo-Distributed Datacenters with Max-Min Fairness
    Chen, Li
    Liu, Shuhao
    Li, Baochun
    Li, Bo
    [J]. IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [2] MIN-Max-Min: A Heuristic Scheduling Algorithm for Jobs Across Geo-distributed Datacenters
    Li, Yan
    Zhu, Chunge
    Wang, Yong
    [J]. 2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1573 - 1574
  • [3] Scheduling Jobs Across Geo-distributed Datacenters
    Hung, Chien-Chun
    Golubchik, Leana
    Yu, Minlan
    [J]. ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, : 111 - 124
  • [4] Distributed WFQ scheduling converging to weighted max-min fairness
    Chrysos, Nikolaos
    Katevenis, Manolis
    [J]. COMPUTER NETWORKS, 2011, 55 (03) : 792 - 806
  • [5] Flutter: Scheduling Tasks Closer to Data Across Geo-Distributed Datacenters
    Hu, Zhiming
    Li, Baochun
    Luo, Jun
    [J]. IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [6] Endpoint-Flexible Coflow Scheduling Across Geo-Distributed Datacenters
    Li, Wenxin
    Yuan, Xu
    Li, Keqiu
    Qi, Heng
    Zhou, Xiaobo
    Xu, Renhai
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (10) : 2466 - 2481
  • [7] Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters
    Chen, Li
    Liu, Shuhao
    Li, Baochun
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (02) : 403 - 414
  • [8] FairShare: Dynamic Max-Min Fairness Bandwidth Allocation in Datacenters
    Tian, Jianbang
    Qian, Zhuzhong
    Dong, Mianxiong
    Lu, Sanglu
    [J]. 2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1463 - 1470
  • [9] Leveraging Endpoint Flexibility When Scheduling Coflows across Geo-distributed Datacenters
    Li, Wenxin
    Yuan, Xu
    Li, Keqiu
    Qi, Heng
    Zhou, Xiaobo
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2018), 2018, : 873 - 881
  • [10] MAST: Global Scheduling of ML Training across Geo-Distributed Datacenters at Hyperscale
    Choudhury, Arnab
    Wang, Yang
    Pelkonen, Tuomas
    Srinivasan, Kutta
    Jain, Abha
    Lin, Shenghao
    David, Delia
    Soleimanifard, Siavash
    Chen, Michael
    Yadav, Abhishek
    Tijoriwala, Ritesh
    Samoylov, Denis
    Tang, Chunqiang
    [J]. PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 563 - 580