Shared Execution of Recurring Workloads in MapReduce

被引:5
|
作者
Lei, Chuan [1 ]
Zhuang, Zhongfang [1 ]
Rundensteiner, Elke A. [1 ]
Eltabakh, Mohamed [1 ]
机构
[1] Worcester Polytech Inst, Worcester, MA 01609 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2015年 / 8卷 / 07期
基金
美国国家科学基金会;
关键词
D O I
10.14778/2752939.2752941
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. The recurring nature of these emerging workloads combined with their SLA constraints make it challenging to share and optimize their execution. While some recent efforts on multi-job optimization in MapReduce have emerged, they focus on only sharing work among ad-hoc jobs on static datasets. Unfortunately, these sharing techniques neither take the recurring nature of the queries into account nor guarantee the satisfaction of the SLA requirements. In this work, we propose the first scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called "Helix". Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. And then, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction. Our experimental results over real-world datasets confirm that Helix significantly outperforms the state-of-art techniques by an order of magnitude.
引用
收藏
页码:714 / 725
页数:12
相关论文
共 50 条
  • [31] CorrectMR: Authentication of Distributed SQL Execution on MapReduce
    Zhang, Bo
    Dong, Boxiang
    Wang, Wendy Hui
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (03) : 897 - 908
  • [32] Resource Optimization for Speculative Execution in a MapReduce Cluster
    Xu, Huanle
    Lau, Wing Cheong
    [J]. 2013 21ST IEEE INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (ICNP), 2013,
  • [33] Reliable Estimation of Execution Time of MapReduce Program
    Yang Xiao
    Sun Jianling
    [J]. CHINA COMMUNICATIONS, 2011, 8 (06) : 11 - 18
  • [34] Improving MapReduce Performance with Partial Speculative Execution
    Yaoguang Wang
    Weiming Lu
    Renjie Lou
    Baogang Wei
    [J]. Journal of Grid Computing, 2015, 13 : 587 - 604
  • [35] Adaptive MapReduce Scheduling in Shared Environments
    Polo, Jorda
    Becerra, Yolanda
    Carrera, David
    Torres, Jordi
    Ayguade, Eduard
    Steinder, Malgorzata
    [J]. 2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 61 - 70
  • [36] Nefeli: Hint-based Execution of Workloads in Clouds
    Tsakalozos, Konstantinos
    Roussopoulos, Mema
    Floros, Vangelis
    Delis, Alex
    [J]. 2010 INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS ICDCS 2010, 2010,
  • [37] Hint-Based Execution of Workloads in Clouds with Nefeli
    Tsakalozos, Konstantinos
    Roussopoulos, Mema
    Delis, Alex
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (07) : 1331 - 1340
  • [38] A user-centric execution environment for CineGrid workloads
    Dumitru, Cosmin
    Grosso, Paola
    de Laat, Cees
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 53 : 55 - 62
  • [39] FPGA- Accelerated Transactional Execution of Graph Workloads
    Ma, Xiaoyu
    Zhang, Dan
    Chiou, Derek
    [J]. FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, : 227 - 236
  • [40] FluxQuery: An Execution Framework for Highly Interactive Query Workloads
    Ebenstein, Roee
    Kamat, Niranjan
    Nandi, Arnab
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1333 - 1345