GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers

被引:0
|
作者
Moïse W. Convolbo
Jerry Chou
Ching-Hsien Hsu
Yeh Ching Chung
机构
[1] National Tsing Hua University,School of Mathematics and Big Data
[2] Foshan University,undefined
[3] Chung Hua University,undefined
来源
Computing | 2018年 / 100卷
关键词
Geo-distributed; Data center; Scheduling; Data locality; Batch jobs; Big data analysis; 90C05 Linear programming; 90C27 Combinatorial optimization; 90C46 Optimality conditions, duality;
D O I
暂无
中图分类号
学科分类号
摘要
Today, data-intensive applications rely on geographically distributed systems to leverage data collection, storing and processing. Data locality has been seen as a prominent technique to improve application performance and reduce the impact of network latency by scheduling jobs directly in the nodes hosting the data to be processed. MapReduce and Dryad are examples of frameworks which exploit locality by splitting jobs into multiple tasks that are dispatched to process portions of data locally. However, as the ecosystem of big data analysis has shifted from single clusters to span geo-distributed data centers, it is unavoidable that data may still be transferred through the network in order reduce the schedule length. Nevertheless, there is a lack of mechanism to efficiently blend data locality and inter-data center data transfer requirement in the existing scheduling techniques to address data-intensive processing across dispersed data centers. Therefore, the objective of this work is to propose and solve the makespan optimization problem for data-intensive job scheduling on geo-distributed data centers. To this end, we first formulate the task placement and the data access as a linear programming and use the GLPK solver to solve it. We then present a low complexity heuristic scheduling algorithm called GeoDis which allows data locality to cope with the data transfer requirement to achieve a greater performance on the makespan. The experiments with various realistic traces and synthetic generated workload show that GeoDis can reduce makespan of processing jobs by 44% as compared to the state-of-the-art algorithms and remain within 91%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$91\%$$\end{document} closer to the optimal solution by the LP solver.
引用
收藏
页码:21 / 46
页数:25
相关论文
共 50 条
  • [1] GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers
    Convolbo, Moise W.
    Chou, Jerry
    Hsu, Ching-Hsien
    Chung, Yeh Ching
    COMPUTING, 2018, 100 (01) : 21 - 46
  • [2] Awan: Locality-aware Resource Manager for Geo-distributed Data-intensive Applications
    Jonathan, Albert
    Chandra, Abhishek
    Weissman, Jon
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2016, : 32 - 41
  • [3] Workload-Aware Scheduling Across Geo-distributed Data Centers
    Jin, Yibo
    Gao, Yuan
    Qian, Zhuzhong
    Zhai, Mingyu
    Peng, Hui
    Lu, Sanglu
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1455 - 1462
  • [4] Customer satisfaction-aware scheduling for utility maximization on geo-distributed data centers
    Jing, Chao
    Zhu, Yanmin
    Li, Minglu
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (05): : 1334 - 1354
  • [5] Electricity and Carbon-aware Task Scheduling in Geo-distributed Internet Data Centers
    Wang, Peng
    Liu, Wenyu
    Cheng, Ming
    Ding, Zhaohao
    Wang, Yi
    2022 IEEE/IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA (I&CPS ASIA 2022), 2022, : 1416 - 1421
  • [6] Temperature Aware Workload Management in Geo-Distributed Data Centers
    Xu, Hong
    Feng, Chen
    Li, Baochun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (06) : 1743 - 1753
  • [7] MapReduce Task Scheduling in Heterogeneous Geo-Distributed Data Centers
    Li, Xiaoping
    Chen, Fuchao
    Ruiz, Ruben
    Zhu, Jie
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3317 - 3329
  • [8] VNF Deployment and Flow Scheduling in Geo-distributed Data Centers
    Gu, Lin
    Chen, Xiaoxiao
    Jin, Hai
    Lu, Feng
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [9] A Scheduling Framework for Periodic Tasks in Geo-Distributed Data Centers
    Li, Yan
    Zhang, Hong
    Wang, Yong
    Liu, Xinran
    Zhang, Peng
    9TH IEEE INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2015), 2015, : 247 - 252
  • [10] DRASH: A Data Replication-Aware Scheduler in Geo-distributed Data Centers
    Convolbo, Moise W.
    Chou, Jerry
    Lu, Shihyu
    Chung, Yeh Ching
    2016 8TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2016), 2016, : 302 - 309