Optimal Query Plans for Geo-distributed Data Analytics at Scale

被引:0
|
作者
Pradhan, Ahana
Karthik, Srinivas
Subramanya, Raghunandan
机构
关键词
Geo-distributed Analytics; Bigdata engine; Query planning;
D O I
10.1145/3632410.3632424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Geo-distributed Data Analytics (GDA) is crucial for organizations handling global data, utilizing numerous data centers (DCs) worldwide. Prior works look to optimize geo-distributed queries by moving the computation closer to the data, and/or to decide join order and job location heuristically. This led to generate low-quality GDA plans. In this work, we propose a novel approach to holistically optimize join order and location, based on dynamic programming, to produce optimal GDA plans. This is built on a new cost model incorporating additional GDA-parameters such as WAN cost, DC locations, heterogeneous DC capabilities. Our strong search space pruning technique helps us scale to hundreds of DCs with small overheads while retaining plan optimality. We implement our solution, GDA-OPT, on an open-source big data system that supports cross-DC analytics. The results show plan improvement (execution performance) over the state-of-the-art solution by 4X on average.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [1] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 421 - 434
  • [2] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    SIGCOMM'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2015, : 421 - 434
  • [3] WANalytics: Geo-Distributed Analytics for a Data Intensive World
    Vulimiri, Ashish
    Curino, Carlo
    Godfrey, P. Brighten
    Jungblut, Thomas
    Karanasos, Konstantinos
    Padhye, Jitu
    Varghese, George
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1087 - 1092
  • [4] Bohr: Similarity Aware Geo-Distributed Data Analytics
    Li, Hangyu
    Xu, Hong
    Nutanong, Sarana
    CONEXT'18: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND TECHNOLOGIES, 2018, : 267 - 279
  • [5] Adaptive Partitioning for Large-Scale Graph Analytics in Geo-Distributed Data Centers
    Zhou, Amelie Chi
    Luo, Juanyun
    Qiu, Ruibo
    Tan, Haobin
    He, Bingsheng
    Mao, Rui
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2818 - 2830
  • [6] Compliant Geo-distributed Query Processing
    Beedkar, Kaustubh
    Quiane-Ruiz, Jorge-Arnulfo
    Markl, Volker
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 181 - 193
  • [7] Plexus: Optimizing Join Approximation for Geo-Distributed Data Analytics
    Wolfrath, Joel
    Chandra, Abhishek
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON CLOUD COMPUTING, SOCC 2023, 2023, : 1 - 16
  • [8] Fast, scalable and geo-distributed PCA for big data analytics
    Adnan, T. M. Tariq
    Tanjim, Md Mehrab
    Adnan, Muhammad Abdullah
    INFORMATION SYSTEMS, 2021, 98 (98)
  • [9] DAG-Aware Optimization for Geo-Distributed Data Analytics
    Wang, Qingyuan
    Gao, Bin
    Zhou, Zhi
    Xu, Fei
    Chenghao, Ouyang
    PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 472 - 481
  • [10] Yugong: Geo-Distributed Data and Job Placement at Scale
    Huang, Yuzhen
    Shi, Yingjie
    Zhong, Zheng
    Feng, Yihui
    Cheng, James
    Li, Jiwei
    Fang, Haochuan
    Li, Chao
    Guan, Tao
    Zhou, Jingren
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2155 - 2169