Dependency-Aware Network Adaptive Scheduling of Data-Intensive Parallel Jobs

被引:9
|
作者
Wang, Shaoqi [1 ]
Chen, Wei [1 ]
Zhou, Xiaobo [1 ]
Zhang, Liqiang [2 ]
Wang, Yin [3 ]
机构
[1] Univ Colorado, Dept Comp Sci, Colorado Springs, CO 80918 USA
[2] Indiana Univ, Dept Comp & Informat Sci, South Bend, IN 46615 USA
[3] Tongji Univ, Dept Compute Sci, Shanghai 201804, Peoples R China
基金
美国国家科学基金会;
关键词
Adaptive task scheduler; network adaptive; job dependency; data-parallel clusters; MAPREDUCE;
D O I
10.1109/TPDS.2018.2866993
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Datacenter clusters often run data-intensive jobs in parallel for improving resource utilization and cost efficiency. The performance of parallel jobs is often constrained by the cluster's hard-to-scale network bisection bandwidth. Various solutions have been proposed to address the issue, however, most of them do not consider inter-job data dependencies and schedule jobs independently from one another. In this work, we find that aggregating and co-locating the data and tasks of dependent jobs offer an extra opportunity for data locality improvement that can help to greatly enhance the performance of jobs. We propose and design Dawn, a dependency-aware network-adaptive scheduler that includes an online plan and an adaptive task scheduler. The online plan, taking job dependencies into consideration, determines where (i.e., preferred racks) to place tasks in order to proactively aggregate dependent data. The task scheduler, based on the output of online plan and dynamic network status, adaptively schedules tasks to co-locate with the dependent data in order to take advantage of data locality. We implement Dawn on Apache Yarn and evaluate it on physical and virtual clusters using various machine learning and query workloads. Results show that Dawn effectively improves cluster throughput by up to 73 and 38 percent compared to Fair Scheduler and ShuffleWatcher, respectively. Dawn not only significantly enhances the performance of jobs with dependency, but also works well for jobs without dependency.
引用
收藏
页码:515 / 529
页数:15
相关论文
共 50 条
  • [31] SwiftS: A Dependency-Aware and Resource Efficient Scheduling for High Throughput in Clouds
    Liu, Jinwei
    Cheng, Long
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM WKSHPS 2021), 2021,
  • [32] Dependency-Aware Dynamic Task Scheduling in Mobile-Edge Computing
    Wang, Mingzhi
    Ma, Tao
    Wu, Tao
    Chang, Chao
    Yang, Fang
    Wang, Huaixi
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING (MSN 2020), 2020, : 785 - 790
  • [33] Spear: Optimized Dependency-Aware Task Scheduling with Deep Reinforcement Learning
    Hu, Zhiming
    Tu, James
    Li, Baochun
    [J]. 2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 2037 - 2046
  • [34] Dependency-Aware Parallel Offloading and Computation in MEC-Enabled Networks
    Kai, Caihong
    Xiao, Shifeng
    Yi, Yibo
    Peng, Min
    Huang, Wei
    [J]. IEEE COMMUNICATIONS LETTERS, 2022, 26 (04) : 853 - 857
  • [35] Cooperative Job Scheduling and Data Allocation for Busy Data-Intensive Parallel Computing Clusters
    Liu, Guoxin
    Shen, Haiying
    Wang, Haoyu
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [36] Data intensive and network aware (DIANA) grid scheduling
    McClatchey R.
    Anjum A.
    Stockinger H.
    Ali A.
    Willers I.
    Thomas M.
    [J]. Journal of Grid Computing, 2007, 5 (1) : 43 - 64
  • [37] LRC: Dependency-Aware Cache Management for Data Analytics Clusters
    Yu, Yinghao
    Wang, Wei
    Zhang, Jun
    Ben Letaief, Khaled
    [J]. IEEE INFOCOM 2017 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2017,
  • [38] CONFUZZIUS: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts
    Torres, Christof Ferreira
    Iannillo, Antonio Ken
    Gervais, Arthur
    State, Radu
    [J]. 2021 IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2021), 2021, : 103 - 119
  • [39] Towards Dependency-Aware Cache Management for Data Analytics Applications
    Yu, Yinghao
    Zhang, Chengliang
    Wang, Wei
    Zhang, Jun
    Ben Letaief, Khaled
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (01) : 706 - 723
  • [40] Task scheduling and file replication for data-intensive jobs with batch-shared I/O
    Khanna, Gaurav
    Vydyanathan, Nagavijayalakshmi
    Catalyurek, Umit
    Kurc, Tahsin
    Krishnamoorthy, Sriram
    Sadayappan, P.
    Saltz, Joel
    [J]. HPDC-15: PROCEEDINGS OF THE 15TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2005, : 241 - 252