The optimization for recurring queries in big data analysis system with MapReduce

被引:20
|
作者
Zhang, Bin [1 ,2 ]
Wang, Xiaoyang [2 ]
Zheng, Zhigao [3 ]
机构
[1] Zhejiang Univ Finance & Econ, Hangzhou 310018, Zhejiang, Peoples R China
[2] Fudan Univ, Sch Comp Sci & Technol, Shanghai 201203, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China
关键词
Big data; Recurring queries; MapReduce; Data reuse; Local schedule; EXECUTION;
D O I
10.1016/j.future.2017.09.063
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As data-intensive cluster computing systems like MapReduce grow in popularity, there is a strong need to promote the efficiency. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving data-intensive workloads, have become a bedrock component in big data analytic applications. Consequently, this paper presents optimization strategies for recurring queries for big data analysis. Firstly, it analyzes the impact of recurring queries efficiency by MapReduce recurring queries model. Secondly, it proposes the MapReduce consistent window slice algorithm, which can not only create more opportunities for reuse of recurring queries, but also greatly reduce redundant data while loading input data by the fine-grained scheduling. Thirdly, in terms of data scheduling, it designs the MapReduce late scheduling strategy that improve data processing and optimize computation resource scheduling in MapReduce cluster. Finally, it constructs the efficient data reuse execution plans by MapReduce recurring queries reuse strategy. The experimental results on a variety of workloads show that the algorithms outperform the state-of-the-art approaches. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:549 / 556
页数:8
相关论文
共 50 条
  • [1] Redoop Infrastructure for Recurring Big Data Queries
    Lei, Chuan
    Zhuang, Zhongfang
    Rundensteiner, Elke A.
    Eltabakh, Mohamed Y.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1589 - 1592
  • [2] Optimization for iterative queries on MapReduce
    Onizuka, Makoto
    Kato, Hiroyuki
    Hidaka, Soichiro
    Nakano, Keisuke
    Hu, Zhenjiang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 7 (04): : 241 - 252
  • [3] MapReduce: Simplified Data Analysis of Big Data
    Maitrey, Seema
    Jha, C. K.
    [J]. 3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 : 563 - 571
  • [4] Analysis of the Big Data based on MapReduce
    Tian, Zi-de
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 224 - 228
  • [5] MapReduce Algorithms for Big Data Analysis
    Shim, Kyuseok
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2016 - 2017
  • [6] MapReduce Algorithms for Big Data Analysis
    Shim, Kyuseok
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : XV - XV
  • [7] Big data classification with optimization driven MapReduce framework
    Mohammed, Mujeeb Shaik
    Rachapudy, Praveen Sam
    Kasa, Madhavi
    [J]. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2021, 25 (02) : 173 - 183
  • [8] Design of MapReduce and CTA for Big Data System
    Kim, Earl
    Shin, Dong-ryeol
    [J]. 2015 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ARTIFICIAL INTELLIGENCE (CAAI 2015), 2015, : 294 - 297
  • [9] Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data
    Abdalla, Hemn Barzan
    Ahmed, Awder Mohammed
    Al Sibahee, M. A.
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (05): : 1886 - 1908
  • [10] The Performance Optimization of Big Data Processing by Adaptive MapReduce Workflow
    Li, Wei
    Tang, Maolin
    [J]. IEEE ACCESS, 2022, 10 : 79004 - 79020