Optimal Query Plans for Geo-distributed Data Analytics at Scale

被引:0
|
作者
Pradhan, Ahana
Karthik, Srinivas
Subramanya, Raghunandan
机构
关键词
Geo-distributed Analytics; Bigdata engine; Query planning;
D O I
10.1145/3632410.3632424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Geo-distributed Data Analytics (GDA) is crucial for organizations handling global data, utilizing numerous data centers (DCs) worldwide. Prior works look to optimize geo-distributed queries by moving the computation closer to the data, and/or to decide join order and job location heuristically. This led to generate low-quality GDA plans. In this work, we propose a novel approach to holistically optimize join order and location, based on dynamic programming, to produce optimal GDA plans. This is built on a new cost model incorporating additional GDA-parameters such as WAN cost, DC locations, heterogeneous DC capabilities. Our strong search space pruning technique helps us scale to hundreds of DCs with small overheads while retaining plan optimality. We implement our solution, GDA-OPT, on an open-source big data system that supports cross-DC analytics. The results show plan improvement (execution performance) over the state-of-the-art solution by 4X on average.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [11] A Network Cost-aware Geo-distributed Data Analytics System
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 649 - 658
  • [12] Delay-Resistant Geo-Distributed Analytics
    Mostafaei, Habib
    Smaragdakis, Georgios
    Zinner, Thomas
    Feldmann, Anja
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (04): : 4734 - 4749
  • [13] Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing
    Zhao, Laiping
    Yang, Yanan
    Munir, Ali
    Liu, Alex X.
    Li, Yue
    Qu, Wenyu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (02) : 279 - 293
  • [14] Multi-Objective Optimizations in Geo-Distributed Data Analytics Systems
    Niu, Zhaojie
    He, Bingsheng
    Zhou, Amelie Chi
    Tong, Lau Chiew
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 519 - 528
  • [15] Network Cost-Aware Geo-Distributed Data Analytics System
    Oh, Kwangsung
    Zhang, Minmin
    Chandra, Abhishek
    Weissman, Jon
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (06) : 1407 - 1420
  • [16] Cost-Minimizing Online Algorithms for Geo-Distributed Data Analytics
    Huang, Jiao
    Huang, Jing
    Gao, Shang
    Yang, Bo
    IEEE ACCESS, 2019, 7 : 163515 - 163525
  • [17] Trading Cost and Throughput in Geo-Distributed Analytics With A Two Time Scale Approach
    Xu, Xinping
    Li, Wenxin
    Xu, Renhai
    Qi, Heng
    Li, Keqiu
    Zhou, Xiaobo
    Chen, Sheng
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (03) : 2163 - 2177
  • [18] Geo-Distributed IoT Data Analytics With Deadline Constraints Across Network Edge
    Chen, Yiting
    Luo, Lailong
    Ren, Bangbang
    Guo, Deke
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (22) : 22914 - 22929
  • [19] A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics
    Kumar, Dhruv
    Li, Jian
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2019, 3 (02)
  • [20] Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
    Heintz, Benjamin
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016), 2016, : 361 - 373