Network Cost-Aware Geo-Distributed Data Analytics System

被引:0
|
作者
Oh, Kwangsung [1 ]
Zhang, Minmin [1 ]
Chandra, Abhishek [2 ]
Weissman, Jon [2 ]
机构
[1] Univ Nebraska, Dept Comp Sci, Omaha, NE 68182 USA
[2] Univ Minnesota Twin Cities, Dept Comp Sci, Minneapolis, MN 55455 USA
关键词
Task analysis; Data transfer; Bandwidth; Wide area networks; Sparks; Iridium; Distributed databases; Geo-distributed data; multi-DCs; multi cloud providers; data analytics system;
D O I
10.1109/TPDS.2021.3108893
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many geo-distributed data analytics (GDA) systems have focused on the network performance-bottleneck: inter-data center network bandwidth to improve performance. Unfortunately, these systems may encounter a cost-bottleneck ($) because they have not considered data transfer cost ($), one of the most expensive and heterogeneous resources in a multi-cloud environment. In this article, we present Kimchi, a network cost-aware GDA system to meet the cost-performance tradeoff by exploiting data transfer cost heterogeneity to avoid the cost-bottleneck. Kimchi determines cost-aware task placement decisions for scheduling tasks given inputs including data transfer cost, network bandwidth, input data size and locations, and desired cost-performance tradeoff preference. In addition, Kimchi is also mindful of data transfer cost in the presence of dynamics. Kimchi has been applied to two common GDA MapReduce models: synchronous barrier and asynchronous push-based shuffle. A Kimchi prototype has been implemented on Spark, and experiments show that it reduces cost by 5% similar to 24% without impacting performance and reduces query execution time by 45% similar to 70% without impacting cost compared to other baseline approaches centralized, vanilla Spark, and bandwidth-aware (e.g., Iridium). More importantly, Kimchi allows applications to explore a much richer cost-performance tradeoff space in a multi-cloud environment.
引用
收藏
页码:1407 / 1420
页数:14
相关论文
共 50 条
  • [1] A Network Cost-aware Geo-distributed Data Analytics System
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    [J]. 2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 649 - 658
  • [2] AggNet: Cost-Aware Aggregation Networks for Geo-distributed Streaming Analytics
    Kumar, Dhruv
    Ahmad, Sohaib
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    [J]. 2021 ACM/IEEE 6TH SYMPOSIUM ON EDGE COMPUTING (SEC 2021), 2021, : 297 - 311
  • [3] Cost-Aware Streaming Workflow Allocation on Geo-Distributed Data Centers
    Chen, Wuhui
    Paik, Incheon
    Li, Zhenni
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (02) : 256 - 271
  • [4] Cost-Aware Big Data Processing Across Geo-Distributed Datacenters
    Xiao, Wenhua
    Bao, Weidong
    Zhu, Xiaomin
    Liu, Ling
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3114 - 3127
  • [5] Power and Cost-aware Virtual Machine Placement in Geo-distributed Data Centers
    Rawas, Soha
    Zekri, Ahmed
    El Zaart, Ali
    [J]. CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 112 - 123
  • [6] Distributed Cost-Aware Fault-Tolerant Load Balancing in Geo-Distributed Data Centers
    Tripathi, Rakesh
    Sivaraman, Vignesh
    Tamarapalli, Venkatesh
    [J]. IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2022, 6 (01): : 472 - 483
  • [7] Bohr: Similarity Aware Geo-Distributed Data Analytics
    Li, Hangyu
    Xu, Hong
    Nutanong, Sarana
    [J]. CONEXT'18: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND TECHNOLOGIES, 2018, : 267 - 279
  • [8] Cost-aware Capacity Provisioning for Fault-tolerant Geo-distributed Data Centers
    Tripathi, Rakesh
    Vignesh, S.
    Tamarapalli, Venkatesh
    [J]. 2016 8TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMSNETS), 2016,
  • [9] SNR: Network-aware Geo-Distributed Stream Analytics
    Mostafaei, Habib
    Afridi, Shafi
    Abawajy, Jemal H.
    [J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 820 - 827
  • [10] DAG-Aware Optimization for Geo-Distributed Data Analytics
    Wang, Qingyuan
    Gao, Bin
    Zhou, Zhi
    Xu, Fei
    Chenghao, Ouyang
    [J]. PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 472 - 481