The optimization for recurring queries in big data analysis system with MapReduce

被引：20

作者：

Zhang, Bin ^{[1
,2
]}

Wang, Xiaoyang ^{[2
]}

Zheng, Zhigao ^{[3
]}

机构：

[1] Zhejiang Univ Finance & Econ, Hangzhou 310018, Zhejiang, Peoples R China

[2] Fudan Univ, Sch Comp Sci & Technol, Shanghai 201203, Peoples R China

[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Hubei, Peoples R China

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 87卷

关键词：

Big data; Recurring queries; MapReduce; Data reuse; Local schedule; EXECUTION;

D O I：

10.1016/j.future.2017.09.063

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

As data-intensive cluster computing systems like MapReduce grow in popularity, there is a strong need to promote the efficiency. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving data-intensive workloads, have become a bedrock component in big data analytic applications. Consequently, this paper presents optimization strategies for recurring queries for big data analysis. Firstly, it analyzes the impact of recurring queries efficiency by MapReduce recurring queries model. Secondly, it proposes the MapReduce consistent window slice algorithm, which can not only create more opportunities for reuse of recurring queries, but also greatly reduce redundant data while loading input data by the fine-grained scheduling. Thirdly, in terms of data scheduling, it designs the MapReduce late scheduling strategy that improve data processing and optimize computation resource scheduling in MapReduce cluster. Finally, it constructs the efficient data reuse execution plans by MapReduce recurring queries reuse strategy. The experimental results on a variety of workloads show that the algorithms outperform the state-of-the-art approaches. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：549 / 556

页数：8

共 50 条

[1] Redoop Infrastructure for Recurring Big Data Queries
Lei, Chuan
Zhuang, Zhongfang
Rundensteiner, Elke A.
Eltabakh, Mohamed Y.
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1589 - 1592
[2] Optimization for iterative queries on MapReduce
Onizuka, Makoto
Kato, Hiroyuki
Hidaka, Soichiro
Nakano, Keisuke
Hu, Zhenjiang
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 7 (04): : 241 - 252
[3] MapReduce: Simplified Data Analysis of Big Data
Maitrey, Seema
Jha, C. K.
[J]. 3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 : 563 - 571
[4] Analysis of the Big Data based on MapReduce
Tian, Zi-de
[J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 224 - 228
[5] MapReduce Algorithms for Big Data Analysis
Shim, Kyuseok
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2016 - 2017
[6] MapReduce Algorithms for Big Data Analysis
Shim, Kyuseok
[J]. DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : XV - XV
[7] Big data classification with optimization driven MapReduce framework
Mohammed, Mujeeb Shaik
Rachapudy, Praveen Sam
Kasa, Madhavi
[J]. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2021, 25 (02) : 173 - 183
[8] Design of MapReduce and CTA for Big Data System
Kim, Earl
Shin, Dong-ryeol
[J]. 2015 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ARTIFICIAL INTELLIGENCE (CAAI 2015), 2015, : 294 - 297
[9] Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data
Abdalla, Hemn Barzan
Ahmed, Awder Mohammed
Al Sibahee, M. A.
[J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (05): : 1886 - 1908
[10] The Performance Optimization of Big Data Processing by Adaptive MapReduce Workflow
Li, Wei
Tang, Maolin
[J]. IEEE ACCESS, 2022, 10 : 79004 - 79020

← 1 2 3 4 5 →