Cost-Minimizing Online Algorithms for Geo-Distributed Data Analytics

被引:1
|
作者
Huang, Jiao [1 ,2 ]
Huang, Jing [1 ,2 ]
Gao, Shang [1 ,2 ]
Yang, Bo [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Approximate nested query; distributed stream processing; resource allocation; error guarantee; CLOUD;
D O I
10.1109/ACCESS.2019.2951682
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern enterprises often manage geographically distributed datacenters around the globe. In such environment, datasets are naturally collected and stored in different data centers and were later queried for complex analytics. In this paper, we study the Wide-Area Data Analytics problem, which aims to efficiently control data movements and achieve low latency for overall queries processing, both constrained by limited and expensive network resources across datacenters. Previous papers focus on offline settings of single analytical queries and do not consider time in optimizing system performance, and therefore ignores the dynamics of data and task placement in terms of inter-DC bandwidth utilization. In this paper, we consider the online setting and formulate a cost-minimizing optimization problem over time for arbitrary Directed Acyclic Graph query processing. Considering dynamics of network resource usage, we developed two online algorithms, Online Switch Resist (OSR) and Most Fixed Horizon Control (MFHC) with good competitive ratios. We performed extensive simulations and comparative studies using the TPC-CH benchmark and verified the efficacy of proposed algorithms. The algorithm we proposed is better than the existing algorithm, and its performance approximates the theoretical optimal value.
引用
收藏
页码:163515 / 163525
页数:11
相关论文
共 50 条
  • [31] Analysis of Cost Minimization Methods in Geo-Distributed Data Centers
    Khalaf, Ayesheh Ahrari
    Abdalla, Aisha Hassan
    PROCEEDINGS OF 6TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING (ICCCE 2016), 2016, : 241 - 245
  • [32] Geo-Distributed IoT Data Analytics With Deadline Constraints Across Network Edge
    Chen, Yiting
    Luo, Lailong
    Ren, Bangbang
    Guo, Deke
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (22) : 22914 - 22929
  • [33] A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics
    Kumar, Dhruv
    Li, Jian
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2019, 3 (02)
  • [34] Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
    Heintz, Benjamin
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016), 2016, : 361 - 373
  • [35] Cost Efficient Design of Fault Tolerant Geo-Distributed Data Centers
    Tripathi, Rakesh
    Vignesh, S.
    Tamarapalli, Venkatesh
    Medhi, Deep
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2017, 14 (02): : 289 - 301
  • [36] A survey on bandwidth-aware geo-distributed frameworks for big-data analytics
    Mohammed Bergui
    Said Najah
    Nikola S. Nikolov
    Journal of Big Data, 8
  • [37] Run Data Run! Re-distributing Data via Piggybacking for Geo-distributed Data Analytics
    Li, Yefei
    Jin, Yibo
    Chen, Haiyang
    Xi, Wenchao
    Ji, Mingtao
    Zhang, Sheng
    Qian, Zhuzhong
    Lu, Sanglu
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 356 - 363
  • [38] Unicorn: Unified resource orchestration for multi-domain, geo-distributed data analytics
    Xiang, Qiao
    Wang, X. Tony
    Zhang, J. Jensen
    Newman, Harvey
    Yang, Y. Richard
    Liu, Y. Jace
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 93 : 188 - 197
  • [39] Adaptive Partitioning for Large-Scale Graph Analytics in Geo-Distributed Data Centers
    Zhou, Amelie Chi
    Luo, Juanyun
    Qiu, Ruibo
    Tan, Haobin
    He, Bingsheng
    Mao, Rui
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2818 - 2830
  • [40] Unicorn: Unified Resource Orchestration for Multi-Domain, Geo-Distributed Data Analytics
    Xiang, Qiao
    Chen, Shenshen
    Gao, Kai
    Newman, Harvey
    Taylor, Ian
    Zhang, Jingxuan
    Yang, Yang Richard
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,