Optimal Query Plans for Geo-distributed Data Analytics at Scale

被引:0
|
作者
Pradhan, Ahana
Karthik, Srinivas
Subramanya, Raghunandan
机构
关键词
Geo-distributed Analytics; Bigdata engine; Query planning;
D O I
10.1145/3632410.3632424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Geo-distributed Data Analytics (GDA) is crucial for organizations handling global data, utilizing numerous data centers (DCs) worldwide. Prior works look to optimize geo-distributed queries by moving the computation closer to the data, and/or to decide join order and job location heuristically. This led to generate low-quality GDA plans. In this work, we propose a novel approach to holistically optimize join order and location, based on dynamic programming, to produce optimal GDA plans. This is built on a new cost model incorporating additional GDA-parameters such as WAN cost, DC locations, heterogeneous DC capabilities. Our strong search space pruning technique helps us scale to hundreds of DCs with small overheads while retaining plan optimality. We implement our solution, GDA-OPT, on an open-source big data system that supports cross-DC analytics. The results show plan improvement (execution performance) over the state-of-the-art solution by 4X on average.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [21] Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics
    Heintz, Benjamin
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (01) : 232 - 245
  • [22] Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics
    Heintz, Benjamin
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    IEEE Transactions on Cloud Computing, 2020, 8 (01): : 232 - 245
  • [23] A survey on bandwidth-aware geo-distributed frameworks for big-data analytics
    Mohammed Bergui
    Said Najah
    Nikola S. Nikolov
    Journal of Big Data, 8
  • [24] Optimizing the Cost-Performance Tradeoff for Geo-distributed Data Analytics with Uncertain Demand
    Li, Wenxin
    Xu, Renhai
    Qi, Heng
    Li, Keqiu
    Zhou, Xiaobo
    2017 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2017,
  • [25] Run Data Run! Re-distributing Data via Piggybacking for Geo-distributed Data Analytics
    Li, Yefei
    Jin, Yibo
    Chen, Haiyang
    Xi, Wenchao
    Ji, Mingtao
    Zhang, Sheng
    Qian, Zhuzhong
    Lu, Sanglu
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 356 - 363
  • [26] Unicorn: Unified resource orchestration for multi-domain, geo-distributed data analytics
    Xiang, Qiao
    Wang, X. Tony
    Zhang, J. Jensen
    Newman, Harvey
    Yang, Y. Richard
    Liu, Y. Jace
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 93 : 188 - 197
  • [27] Unicorn: Unified Resource Orchestration for Multi-Domain, Geo-Distributed Data Analytics
    Xiang, Qiao
    Chen, Shenshen
    Gao, Kai
    Newman, Harvey
    Taylor, Ian
    Zhang, Jingxuan
    Yang, Yang Richard
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [28] Think Before You Shuffle: Data-Driven Shuffles for Geo-Distributed Analytics
    Goyal, Maruth
    Akella, Aditya
    PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON BIGIG DATA IN EMERGENT DISTRIBUTED ENVIRONMENTS (BIDEDE 2022), 2022,
  • [29] A survey on bandwidth-aware geo-distributed frameworks for big-data analytics
    Bergui, Mohammed
    Najah, Said
    Nikolov, Nikola S.
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [30] ran-GJS']JS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges
    Jin, Yibo
    Qian, Zhuzhong
    Guo, Song
    Zhang, Sheng
    Wang, Xiaoliang
    Lu, Sanglu
    PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,