Shared Execution of Recurring Workloads in MapReduce

被引:5
|
作者
Lei, Chuan [1 ]
Zhuang, Zhongfang [1 ]
Rundensteiner, Elke A. [1 ]
Eltabakh, Mohamed [1 ]
机构
[1] Worcester Polytech Inst, Worcester, MA 01609 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 07期
基金
美国国家科学基金会;
关键词
D O I
10.14778/2752939.2752941
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. The recurring nature of these emerging workloads combined with their SLA constraints make it challenging to share and optimize their execution. While some recent efforts on multi-job optimization in MapReduce have emerged, they focus on only sharing work among ad-hoc jobs on static datasets. Unfortunately, these sharing techniques neither take the recurring nature of the queries into account nor guarantee the satisfaction of the SLA requirements. In this work, we propose the first scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called "Helix". Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. And then, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction. Our experimental results over real-world datasets confirm that Helix significantly outperforms the state-of-art techniques by an order of magnitude.
引用
收藏
页码:714 / 725
页数:12
相关论文
共 50 条
  • [1] On the optimization of schedules for MapReduce workloads in the presence of shared scans
    Wolf, Joel
    Balmin, Andrey
    Rajan, Deepak
    Hildrum, Kirsten
    Khandekar, Rohit
    Parekh, Sujay
    Wu, Kun-Lung
    Vernica, Rares
    [J]. VLDB JOURNAL, 2012, 21 (05): : 589 - 609
  • [2] On the optimization of schedules for MapReduce workloads in the presence of shared scans
    Joel Wolf
    Andrey Balmin
    Deepak Rajan
    Kirsten Hildrum
    Rohit Khandekar
    Sujay Parekh
    Kun-Lung Wu
    Rares Vernica
    [J]. The VLDB Journal, 2012, 21 : 589 - 609
  • [3] Fangorn: Adaptive Execution Framework for Heterogeneous Workloads on Shared Clusters
    Chen, Yingda
    Wang, Jiamang
    Lu, Yifeng
    Han, Ying
    Lv, Zhiqiang
    Min, Xuebin
    Cai, Hua
    Zhang, Wei
    Fan, Haochuan
    Li, Chao
    Guan, Tao
    Lin, Wei
    Jia, Yangqing
    Zhou, Jingren
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2972 - 2985
  • [4] Semantic Characterization of MapReduce Workloads
    Xu, Zhihong
    Hirzel, Martin
    Rothermel, Gregg
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 87 - +
  • [5] Elastic MapReduce Execution
    Goh, Wei Xiang
    Tan, Kian-Lee
    [J]. 2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 216 - 225
  • [6] POSUM: A Portfolio Scheduler for MapReduce Workloads
    Voinea, Maria A.
    Uta, Alexandru
    Iosup, Alexandru
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 351 - 357
  • [7] Analytical Performance Models for MapReduce Workloads
    Vianna, Emanuel
    Comarela, Giovanni
    Pontes, Tatiana
    Almeida, Jussara
    Almeida, Virgilio
    Wilkinson, Kevin
    Kuno, Harumi
    Dayal, Umeshwar
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2013, 41 (04) : 495 - 525
  • [8] A Dynamic MapReduce Scheduler for Heterogeneous Workloads
    Tian, Chao
    Zhou, Haojie
    He, Yongqiang
    Zha, Li
    [J]. 2009 EIGHTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2009, : 218 - 224
  • [9] Analytical Performance Models for MapReduce Workloads
    Emanuel Vianna
    Giovanni Comarela
    Tatiana Pontes
    Jussara Almeida
    Virgílio Almeida
    Kevin Wilkinson
    Harumi Kuno
    Umeshwar Dayal
    [J]. International Journal of Parallel Programming, 2013, 41 : 495 - 525
  • [10] Evaluating Distributed Execution of Workloads
    Turilli, Matteo
    Babuji, Yadu Nand
    Merzky, Andre
    Ha, Ming Tai
    Wilde, Michael
    Katz, Daniel S.
    Jha, Shantenu
    [J]. 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2017, : 276 - 285