Optimizing Cloud MapReduce for Processing Stream Data using Pipelining

被引：5

作者：

Karve, Rutvik ^{[1
]}

Dahiphale, Devendra ^{[1
]}

Chhajer, Amit ^{[1
]}

机构：

[1] Pune Inst Comp Technol, Dept Comp Engn, Pune, Maharashtra, India

来源：

UKSIM FIFTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2011) | 2011年

关键词：

MapReduce; Cloud Computing; Pipelining; Stream Processing; Distributed Computing;

D O I：

10.1109/EMS.2011.76

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Cloud MapReduce (CMR) is a framework for processing large data sets of batch data in cloud. The Map and Reduce phases run sequentially, one after another. This leads to: 1. Compulsory batch processing 2. No parallelization of the map and reduce phases 3. Increased delays. The current implementation is not suited for processing streaming data. We propose a novel architecture to support streaming data as input using pipelining between the Map and Reduce phases in CMR, ensuring that the output of the Map phase is made available to the Reduce phase as soon as it is produced. This 'Pipelined MapReduce' approach leads to increased parallelism between the Map and Reduce phases; thereby 1. Supporting streaming data as input 2. Reducing delays 3. Enabling the user to take ' snapshots' of the approximate output generated in a stipulated time frame. 4. Supporting cascaded MapReduce jobs. This cloud implementation is light-weight and inherently scalable.

引用

页码：344 / 349

页数：6

共 50 条

[21] Automating Platform Selection for MapReduce Processing in the Cloud
Zhang, Zhuoyao
Cherkasova, Ludmila
Boon Thau Loo
[J]. 2015 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC), 2015, : 125 - 136
[22] An improved partitioning mechanism for optimizing massive data analysis using MapReduce
Kenn Slagter
Ching-Hsien Hsu
Yeh-Ching Chung
Daqiang Zhang
[J]. The Journal of Supercomputing, 2013, 66 : 539 - 555
[23] Simplifying MapReduce data processing
Liao, Chih-Shan
Shih, Jin-Ming
Chang, Ruay-Shiung
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2013, 8 (03) : 219 - 226
[24] AJIRA: a Lightweight Distributed Middleware for MapReduce and Stream Processing
Urbani, Jacopo
Margara, Alessandro
Jacobs, Ceriel
Voulgaris, Spyros
Bal, Henri
[J]. 2014 IEEE 34TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2014), 2014, : 545 - 554
[25] Evolving Big Data Stream Classification with MapReduce
Haque, Ahsanul
Parker, Brandon
Khan, Latifur
Thuraisingham, Bhavani
[J]. 2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 570 - 577
[26] Elastic stream processing in the Cloud
Hummer, Waldemar
Satzger, Benjamin
Dustdar, Schahram
[J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 333 - 345
[27] An overview and an Approach for Graph Data Processing using Hadoop MapReduce
Talan, Pooja P.
Sharma, Kartik U.
[J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 59 - 63
[28] P2P-MapReduce: Parallel data processing in dynamic Cloud environments
Marozzo, Fabrizio
Talia, Domenico
Trunfio, Paolo
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (05) : 1382 - 1402
[29] Evaluating MapReduce for seismic data processing using a practical application
Zhao, Chang-Hai
Yan, Hai-Hua
Liu, Xiao-Peng
Xiong, Deng
Shi, Xiao-Hua
[J]. Tongxin Xuebao/Journal on Communications, 2012, 33 (SUPPL.2): : 81 - 89
[30] Queuing-Oriented Job Optimizing Scheduling In Cloud Mapreduce
He, Ting-Qin
Cai, Li-Jun
Deng, Zi-Yun
Meng, Tao
Wang, XuAn
[J]. ADVANCES ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING, 2017, 1 : 435 - 446

← 1 2 3 4 5 →