Flutter: Scheduling Tasks Closer to Data Across Geo-Distributed Datacenters

被引:0
|
作者
Hu, Zhiming [1 ]
Li, Baochun [2 ]
Luo, Jun [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[2] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 1A1, Canada
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Typically called big data processing, processing large volumes of data from geographically distributed regions with machine learning algorithms has emerged as an important analytical tool for governments and multinational corporations. The traditional wisdom calls for the collection of all the data across the world to a central datacenter location, to be processed using data-parallel applications. This is neither efficient nor practical as the volume of data grows exponentially. Rather than transferring data, we believe that computation tasks should be scheduled where the data is, while data should be processed with a minimum amount of transfers across datacenters. In this paper, we design and implement Flutter, a new task scheduling algorithm that improves the completion times of big data processing jobs across geographically distributed datacenters. To cater to the specific characteristics of data-parallel applications, we first formulate our problem as a lexicographical min-max integer linear programming (ILP) problem, and then transform it into a nonlinear program with a separable convex objective function and a totally unimodular constraint matrix, which can be solved using a standard linear programming solver efficiently in an online fashion. Our implementation of Flutter is based on Apache Spark, a modern framework popular for big data processing. Our experimental results have shown that we can reduce the job completion time by up to 25%, and the amount of traffic transferred among datacenters by up to 75%.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Scheduling Jobs Across Geo-distributed Datacenters
    Hung, Chien-Chun
    Golubchik, Leana
    Yu, Minlan
    [J]. ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, : 111 - 124
  • [2] Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
    Wu, Yu
    Zhang, Zhizhong
    Wu, Chuan
    Guo, Chuanxiong
    Li, Zongpeng
    Lau, Francis C. M.
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2017, 5 (01) : 112 - 125
  • [3] Endpoint-Flexible Coflow Scheduling Across Geo-Distributed Datacenters
    Li, Wenxin
    Yuan, Xu
    Li, Keqiu
    Qi, Heng
    Zhou, Xiaobo
    Xu, Renhai
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (10) : 2466 - 2481
  • [4] Scheduling Jobs across Geo-Distributed Datacenters with Max-Min Fairness
    Chen, Li
    Liu, Shuhao
    Li, Baochun
    Li, Bo
    [J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2019, 6 (03): : 488 - 500
  • [5] Leveraging Endpoint Flexibility When Scheduling Coflows across Geo-distributed Datacenters
    Li, Wenxin
    Yuan, Xu
    Li, Keqiu
    Qi, Heng
    Zhou, Xiaobo
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2018), 2018, : 873 - 881
  • [6] Scheduling Jobs across Geo-Distributed Datacenters with Max-Min Fairness
    Chen, Li
    Liu, Shuhao
    Li, Baochun
    Li, Bo
    [J]. IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [7] MAST: Global Scheduling of ML Training across Geo-Distributed Datacenters at Hyperscale
    Choudhury, Arnab
    Wang, Yang
    Pelkonen, Tuomas
    Srinivasan, Kutta
    Jain, Abha
    Lin, Shenghao
    David, Delia
    Soleimanifard, Siavash
    Chen, Michael
    Yadav, Abhishek
    Tijoriwala, Ritesh
    Samoylov, Denis
    Tang, Chunqiang
    [J]. PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 563 - 580
  • [8] Calantha: Content Distribution across Geo-Distributed Datacenters
    Li, Yangyang
    Zhang, Linchao
    Jia, Yue
    Liao, Yong
    Xie, Haiyong
    [J]. 2017 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2017, : 724 - 729
  • [9] A Scheduling Framework for Periodic Tasks in Geo-Distributed Data Centers
    Li, Yan
    Zhang, Hong
    Wang, Yong
    Liu, Xinran
    Zhang, Peng
    [J]. 9TH IEEE INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2015), 2015, : 247 - 252
  • [10] Cost-Aware Big Data Processing Across Geo-Distributed Datacenters
    Xiao, Wenhua
    Bao, Weidong
    Zhu, Xiaomin
    Liu, Ling
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3114 - 3127