Optimal Query Plans for Geo-distributed Data Analytics at Scale

被引:0
|
作者
Pradhan, Ahana
Karthik, Srinivas
Subramanya, Raghunandan
机构
关键词
Geo-distributed Analytics; Bigdata engine; Query planning;
D O I
10.1145/3632410.3632424
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Geo-distributed Data Analytics (GDA) is crucial for organizations handling global data, utilizing numerous data centers (DCs) worldwide. Prior works look to optimize geo-distributed queries by moving the computation closer to the data, and/or to decide join order and job location heuristically. This led to generate low-quality GDA plans. In this work, we propose a novel approach to holistically optimize join order and location, based on dynamic programming, to produce optimal GDA plans. This is built on a new cost model incorporating additional GDA-parameters such as WAN cost, DC locations, heterogeneous DC capabilities. Our strong search space pruning technique helps us scale to hundreds of DCs with small overheads while retaining plan optimality. We implement our solution, GDA-OPT, on an open-source big data system that supports cross-DC analytics. The results show plan improvement (execution performance) over the state-of-the-art solution by 4X on average.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [41] Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time
    Li, Peng
    Guo, Song
    Miyazaki, Toshiaki
    Liao, Xiaofei
    Jin, Hai
    Zomaya, Albert Y.
    Wang, Kun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1785 - 1796
  • [42] Optimal Online Data Partitioning for Geo-Distributed Machine Learning in Edge of Wireless Networks
    Lyu, Xinchen
    Ren, Chenshan
    Ni, Wei
    Tian, Hui
    Liu, Ren Ping
    Dutkiewicz, Eryk
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2019, 37 (10) : 2393 - 2406
  • [43] An Optimal Task Placement Strategy in Geo-Distributed Data Centers Involving Renewable Energy
    Wang, Ran
    Lu, Yiwen
    Zhu, Kun
    Hao, Jie
    Wang, Ping
    Cao, Yue
    IEEE ACCESS, 2018, 6 : 61948 - 61958
  • [44] Optimal Task Placement with QoS Constraints in Geo-Distributed Data Centers Using DVFS
    Gu, Lin
    Zeng, Deze
    Barnawi, Ahmed
    Guo, Song
    Stojmenovic, Ivan
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (07) : 2049 - 2059
  • [45] Green Computing with Geo-Distributed Heterogeneous Data Centers
    Pasricha, Sudeep
    Hogade, Ninad
    Siegel, Howard Jay
    Maciejewski, Anthony A.
    2019 TENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2019,
  • [46] Investigation of Network Traffic in Geo-Distributed Data Centers
    Koshiba, Yutaka
    Chen, Wuhui
    Yamada, Yuichi
    Tanaka, Takazumi
    Paik, Incheon
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE & TECHNOLOGY (ICAST), 2015, : 174 - 179
  • [47] Fast Big Data Analysis in Geo-Distributed Cloud
    Li, Yue
    Zhao, Laiping
    Cui, Chenzhou
    Yu, Ce
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 388 - 391
  • [48] Fast media caching for geo-distributed data centers
    Zhang, Wei
    Wen, Yonggang
    Liu, Fang
    Chen, Yiqiang
    Fan, Rui
    COMPUTER COMMUNICATIONS, 2018, 120 : 46 - 57
  • [49] Holistic Management of Sustainable Geo-Distributed Data Centers
    Abbasi, Zahra
    Gupta, Sandeep K. S.
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 426 - 435
  • [50] AggNet: Cost-Aware Aggregation Networks for Geo-distributed Streaming Analytics
    Kumar, Dhruv
    Ahmad, Sohaib
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    2021 ACM/IEEE 6TH SYMPOSIUM ON EDGE COMPUTING (SEC 2021), 2021, : 297 - 311