Optimal Query Plans for Geo-distributed Data Analytics at Scale

被引:0
|
作者
Pradhan, Ahana
Karthik, Srinivas
Subramanya, Raghunandan
机构
关键词
Geo-distributed Analytics; Bigdata engine; Query planning;
D O I
10.1145/3632410.3632424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Geo-distributed Data Analytics (GDA) is crucial for organizations handling global data, utilizing numerous data centers (DCs) worldwide. Prior works look to optimize geo-distributed queries by moving the computation closer to the data, and/or to decide join order and job location heuristically. This led to generate low-quality GDA plans. In this work, we propose a novel approach to holistically optimize join order and location, based on dynamic programming, to produce optimal GDA plans. This is built on a new cost model incorporating additional GDA-parameters such as WAN cost, DC locations, heterogeneous DC capabilities. Our strong search space pruning technique helps us scale to hundreds of DCs with small overheads while retaining plan optimality. We implement our solution, GDA-OPT, on an open-source big data system that supports cross-DC analytics. The results show plan improvement (execution performance) over the state-of-the-art solution by 4X on average.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [31] Efficient Graph Query Processing over Geo-Distributed Datacenters
    Yuan, Ye
    Ma, Delong
    Wen, Zhenyu
    Ma, Yuliang
    Wang, Guoren
    Chen, Lei
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 619 - 628
  • [32] SNR: Network-aware Geo-Distributed Stream Analytics
    Mostafaei, Habib
    Afridi, Shafi
    Abawajy, Jemal H.
    21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 820 - 827
  • [33] Sketch and Scale Geo-distributed tSNE and UMAP
    Wei, Viska
    Ivkin, Nikita
    Braverman, Vladimir
    Szalay, Alexander S.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 996 - 1003
  • [34] Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning
    Xu, Chenhan
    Wang, Kun
    Li, Peng
    Xia, Rui
    Guo, Song
    Guo, Minyi
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (01): : 205 - 215
  • [35] GeeLytics: Geo-distributed Edge Analytics for Large Scale IoT Systems Based on Dynamic Topology
    Cheng, Bin
    Papageorgiou, Apostolos
    Cirillo, Flavio
    Kovacs, Ernoe
    2015 IEEE 2ND WORLD FORUM ON INTERNET OF THINGS (WF-IOT), 2015, : 565 - 570
  • [36] Efficient Geo-Distributed Data Processing with Rout
    Jayalath, Chamikara
    Eugster, Patrick
    2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 470 - 480
  • [37] runData: Re-Distributing Data via Piggybacking for Geo-Distributed Data Analytics Over Edges
    Jin, Yibo
    Qian, Zhuzhong
    Guo, Song
    Zhang, Sheng
    Jiao, Lei
    Lu, Sanglu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (01) : 40 - 55
  • [38] Distributed Data Strategies to Support Large-Scale Data Analysis Across Geo-Distributed Data Centers
    Emara, Tamer Z.
    Huang, Joshua Zhexue
    IEEE ACCESS, 2020, 8 (178526-178538) : 178526 - 178538
  • [39] Compliant Geo-distributed Data Processing in Action
    Beedkar, Kaustubh
    Brekardin, David
    Quiane-Ruiz, Jorge-Anulfo
    Markl, Volker
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2843 - 2846
  • [40] Octopus: Based on Congestion-aware Scheduling on Geo-distributed Big Data Analytics Cluster
    Du, Haizhou
    Zhang, Keke
    Yang, Zhenchen
    2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2018, : 490 - 495