DAG-aware harmonizing job scheduling and data caching for disaggregated analytics frameworks

被引:0
|
作者
Tong, Yulai [1 ,2 ]
Liu, Jiazhen [2 ]
Wang, Hua [1 ,2 ]
He, Mingjian [2 ]
Zhou, Ke [1 ,2 ]
He, Rongfeng [3 ]
Zhang, Qin [3 ]
Wang, Cheng [3 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan 430074, Peoples R China
[3] Huawei Cloud Comp Technol Co Ltd, Chengdu 611730, Peoples R China
基金
中国国家自然科学基金;
关键词
Data analytics frameworks; Job scheduling; Cache management; Storage disaggregation;
D O I
10.1016/j.future.2024.03.005
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Modern data analytics frameworks often integrate with external storage services, which can lead to storage bottlenecks. Existing caching and prefetching solutions utilize high-level information from data analytics frameworks to forecast future data accesses. They employ these predictions to prefetch data into the cache and manage the cache contents. However, this approach overlooks a fundamental opportunity: rather than caching data given a prediction of job execution, influencing the job execution order can enable more effective caching and prefetching. With this key insight, we introduce a novel system called TRIPOD, designed to synchronize job scheduling and data caching for analytics frameworks. Leveraging the higher -level information provided by analytics frameworks, TRIPOD explores the best -suited job execution order, guided by developed heuristics, to facilitate prefetching and caching. To fully exploit the potential of TRIPOD, we also introduce a novel caching strategy named CAP. This strategy not only acknowledges the job scheduling order but also offers fine-grained control over object prefetching and eviction. Our evaluation, conducted using standard analytic benchmarks (TPC-H and TPC-DS), demonstrates that TRIPOD achieves a speedup of up to 1.7x on state-of-the-art approaches. Moreover, when employing CAP to make caching decisions, the performance can further be improved (as much as 1.5x).
引用
下载
收藏
页码:116 / 129
页数:14
相关论文
共 49 条
  • [1] Tripod: Harmonizing Job Scheduling and Data Caching for Analytics Frameworks
    Tong, Yulai
    Wang, Cheng
    Liu, Jiazhen
    Wang, Hua
    Zhou, Ke
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 610 - 618
  • [2] DAG-Aware Optimization for Geo-Distributed Data Analytics
    Wang, Qingyuan
    Gao, Bin
    Zhou, Zhi
    Xu, Fei
    Chenghao, Ouyang
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 472 - 481
  • [3] Branch Scheduling: DAG-Aware Scheduling for Speeding up Data-Parallel Jobs
    Hu, Zhiyao
    Li, Dongsheng
    Zhang, Yiming
    Guo, Deke
    Li, Ziyang
    PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS 2019), 2019,
  • [4] DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters
    Xu, Yinggen
    Liu, Liu
    Ding, Zhijun
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 378 - 387
  • [5] Performance Improvement of DAG-Aware Task Scheduling Algorithms with Efficient Cache Management in Spark
    Zhao, Yao
    Dong, Jian
    Liu, Hongwei
    Wu, Jin
    Liu, Yanxin
    ELECTRONICS, 2021, 10 (16)
  • [6] CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks
    Park, Seongsoo
    Jeong, Minseop
    Han, Hwansoo
    SENSORS, 2021, 21 (07)
  • [7] Energy-Aware Streaming Analytics Job Scheduling for Edge Computing
    Trihinas, Demetris
    Symeonides, Moysis
    Georgiou, Joanna
    Pallis, George
    Dikaiakos, Marios D.
    2023 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE, CLOUDCOM 2023, 2023, : 161 - 168
  • [8] Job-aware scheduling for big data processing
    Wang, Zhigang
    Shen, Yanming
    2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2015, : 177 - 180
  • [9] Network Aware Job Scheduling in Green Data Centers
    Cavdar, Derya
    Alagoz, Fatib
    2014 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2014, : 168 - 172
  • [10] Genetic Algorithm based Job Scheduling for Big Data Analytics
    Lu, Qinghua
    Li, Shanshan
    Zhang, Weishan
    2015 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION, AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2015, : 33 - 38