Cost-Minimizing Online Algorithms for Geo-Distributed Data Analytics

被引:1
|
作者
Huang, Jiao [1 ,2 ]
Huang, Jing [1 ,2 ]
Gao, Shang [1 ,2 ]
Yang, Bo [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Approximate nested query; distributed stream processing; resource allocation; error guarantee; CLOUD;
D O I
10.1109/ACCESS.2019.2951682
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern enterprises often manage geographically distributed datacenters around the globe. In such environment, datasets are naturally collected and stored in different data centers and were later queried for complex analytics. In this paper, we study the Wide-Area Data Analytics problem, which aims to efficiently control data movements and achieve low latency for overall queries processing, both constrained by limited and expensive network resources across datacenters. Previous papers focus on offline settings of single analytical queries and do not consider time in optimizing system performance, and therefore ignores the dynamics of data and task placement in terms of inter-DC bandwidth utilization. In this paper, we consider the online setting and formulate a cost-minimizing optimization problem over time for arbitrary Directed Acyclic Graph query processing. Considering dynamics of network resource usage, we developed two online algorithms, Online Switch Resist (OSR) and Most Fixed Horizon Control (MFHC) with good competitive ratios. We performed extensive simulations and comparative studies using the TPC-CH benchmark and verified the efficacy of proposed algorithms. The algorithm we proposed is better than the existing algorithm, and its performance approximates the theoretical optimal value.
引用
收藏
页码:163515 / 163525
页数:11
相关论文
共 50 条
  • [21] Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint
    Islam, Mohammad A.
    Gandhi, Anshul
    Ren, Shaolei
    2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
  • [22] GeoClone: Online Task Replication and Scheduling for Geo-Distributed Analytics under Uncertainties
    Wang, Tiantian
    Qian, Zhuzhong
    Jiao, Lei
    Li, Xin
    Lu, Sanglu
    2020 IEEE/ACM 28TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2020,
  • [23] Trading Cost and Throughput in Geo-Distributed Analytics With A Two Time Scale Approach
    Xu, Xinping
    Li, Wenxin
    Xu, Renhai
    Qi, Heng
    Li, Keqiu
    Zhou, Xiaobo
    Chen, Sheng
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (03) : 2163 - 2177
  • [24] AggNet: Cost-Aware Aggregation Networks for Geo-distributed Streaming Analytics
    Kumar, Dhruv
    Ahmad, Sohaib
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    2021 ACM/IEEE 6TH SYMPOSIUM ON EDGE COMPUTING (SEC 2021), 2021, : 297 - 311
  • [25] Delay-Resistant Geo-Distributed Analytics
    Mostafaei, Habib
    Smaragdakis, Georgios
    Zinner, Thomas
    Feldmann, Anja
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (04): : 4734 - 4749
  • [26] Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing
    Zhao, Laiping
    Yang, Yanan
    Munir, Ali
    Liu, Alex X.
    Li, Yue
    Qu, Wenyu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (02) : 279 - 293
  • [27] Multi-Objective Optimizations in Geo-Distributed Data Analytics Systems
    Niu, Zhaojie
    He, Bingsheng
    Zhou, Amelie Chi
    Tong, Lau Chiew
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 519 - 528
  • [28] Minimizing Geo-Distributed Interactive Service Cost With Multiple Cloud Service Providers
    Hu, Fei
    Liu, Qingchun
    Wu, Jiahong
    Yao, Jianguo
    IEEE ACCESS, 2019, 7 : 3320 - 3335
  • [29] Optimizing cost for geo-distributed storage systems in online social networks
    Zhou, Jingya
    Fan, Jianxi
    Jia, Juncheng
    Cheng, Baolei
    Liu, Zhao
    JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 26 : 363 - 374
  • [30] Cost Minimization for Big Data Processing in Geo-Distributed Data Centers
    Gu, Lin
    Zeng, Deze
    Li, Peng
    Guo, Song
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) : 314 - 323