Harmony: An Approach for Geo-distributed Processing of Big-Data Applications

被引:4
|
作者
Zhang, Han [1 ]
Ramapantulu, Lavanya [2 ]
Teo, Yong Meng [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Singapore, Singapore
[2] Int Inst Informat Technol, Comp Sci Grp, Hyderabad, India
关键词
geo-distributed processing; data-centers; performance analysis; scheduling; MAPREDUCE;
D O I
10.1109/cluster.2019.8891053
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big-data application processing is increasingly geodistributed, a paradigm shift from the traditional cluster-based processing frameworks. As the communication time for data movement across geo-distributed data centers is not a design criterion for traditional cluster-based processing approaches, there are research gaps in the algorithms used for staging and scheduling big-data applications for geo-distributed clusters. We address these gaps by proposing Harmony, an approach consisting of both staging and scheduling strategies to minimize an application's total execution time. The staging strategy of Harmony exploits the intra-stage parallelism by having concurrent operators within a stage in contrast to the traditional Apache spark which uses fine-grained stages, thus reducing the computation time within each stage. Secondly, the scheduling strategy of Harmony reduces data transfers between geo-distributed data centers by exploiting data locality and thus reducing communication time and total execution time. The proposed approach Harmony achieves a speedup of two times with respect to geo-distributed Apache Spark. In addition, Harmony achieves a speedup of 1.6 times and 2.1 times when compared with the state-of-the-art framework Iridium for geo-distributed analytics over five locations with uniform and non-uniform network link bandwidths respectively.
引用
收藏
页码:160 / 170
页数:11
相关论文
共 50 条
  • [1] A survey on bandwidth-aware geo-distributed frameworks for big-data analytics
    Mohammed Bergui
    Said Najah
    Nikola S. Nikolov
    [J]. Journal of Big Data, 8
  • [2] A survey on bandwidth-aware geo-distributed frameworks for big-data analytics
    Bergui, Mohammed
    Najah, Said
    Nikolov, Nikola S.
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)
  • [3] Cost Minimization for Big Data Processing in Geo-Distributed Data Centers
    Gu, Lin
    Zeng, Deze
    Li, Peng
    Guo, Song
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) : 314 - 323
  • [4] Privacy-Preserving Deep Learning Computation for Geo-Distributed Medical Big-Data Platforms
    Jeon, Joohyung
    Kim, Junhui
    Kim, Joongheon
    Kim, Kwangsoo
    Mohaisen, Aziz
    Kim, Jong-Kook
    [J]. 2019 49TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS - SUPPLEMENTAL VOL (DSN-S), 2019, : 3 - 4
  • [5] Cost-Aware Big Data Processing Across Geo-Distributed Datacenters
    Xiao, Wenhua
    Bao, Weidong
    Zhu, Xiaomin
    Liu, Ling
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3114 - 3127
  • [6] Efficient Geo-Distributed Data Processing with Rout
    Jayalath, Chamikara
    Eugster, Patrick
    [J]. 2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 470 - 480
  • [8] Compliant Geo-distributed Data Processing in Action
    Beedkar, Kaustubh
    Brekardin, David
    Quiane-Ruiz, Jorge-Anulfo
    Markl, Volker
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2843 - 2846
  • [9] Fast Big Data Analysis in Geo-Distributed Cloud
    Li, Yue
    Zhao, Laiping
    Cui, Chenzhou
    Yu, Ce
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 388 - 391
  • [10] A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers
    Gu, Lin
    Zeng, Deze
    Guo, Song
    Xiang, Yong
    Hu, Jiankun
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (01) : 19 - 29