Optimizing Cloud MapReduce for Processing Stream Data using Pipelining

被引：5

作者：

Karve, Rutvik ^{[1
]}

Dahiphale, Devendra ^{[1
]}

Chhajer, Amit ^{[1
]}

机构：

[1] Pune Inst Comp Technol, Dept Comp Engn, Pune, Maharashtra, India

来源：

UKSIM FIFTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2011) | 2011年

关键词：

MapReduce; Cloud Computing; Pipelining; Stream Processing; Distributed Computing;

D O I：

10.1109/EMS.2011.76

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Cloud MapReduce (CMR) is a framework for processing large data sets of batch data in cloud. The Map and Reduce phases run sequentially, one after another. This leads to: 1. Compulsory batch processing 2. No parallelization of the map and reduce phases 3. Increased delays. The current implementation is not suited for processing streaming data. We propose a novel architecture to support streaming data as input using pipelining between the Map and Reduce phases in CMR, ensuring that the output of the Map phase is made available to the Reduce phase as soon as it is produced. This 'Pipelined MapReduce' approach leads to increased parallelism between the Map and Reduce phases; thereby 1. Supporting streaming data as input 2. Reducing delays 3. Enabling the user to take ' snapshots' of the approximate output generated in a stipulated time frame. 4. Supporting cascaded MapReduce jobs. This cloud implementation is light-weight and inherently scalable.

引用

页码：344 / 349

页数：6

共 50 条

[1] Research of a MapReduce Communication Data Stream Processing Model
Yang, Wenchuan
Jia, Bei
[J]. PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 28 - 31
[2] Optimizing distributed data stream processing by tracing
Zvara, Zoltan
Szabo, Peter G. N.
Balazs, Barnabas
Benczur, Andras
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 578 - 591
[3] Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud
Zhang, Zhuoyao
Cherkasova, Ludmila
Loo, Boon Thau
[J]. 2014 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2014,
[4] Data stream treatment using sliding windows with MapReduce
Jose Basgall, Maria
Hasperue, Waldo
Naiouf, Marcelo
[J]. JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2016, 16 (02): : 76 - 83
[5] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
Satish, Rama K., V
Kavya, N. P.
[J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54
[6] An Optimal Model for Optimizing the Placement and Parallelism of Data Stream Processing Applications on Cloud-Edge Computing
de Souza, Felipe Rodrigo
de Assuncao, Marcos Dias
Caron, Eddy
Veith, Alexandre da Silva
[J]. 2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020), 2020, : 59 - 66
[7] Smart MapReduce Cloud: Applying Extra Processing to Intermediate Data on Demand
Huang, Tzu-Chi
Chu, Kuo-Chih
Tsai, Ming-Fong
[J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 799 - 804
[8] Elastic and Scalable Processing of Linked Stream Data in the Cloud
Le-Phuoc, Danh
Hoan Nguyen Mau Quoc
Le Van, Chan
Hauswirth, Manfred
[J]. SEMANTIC WEB - ISWC 2013, PART I, 2013, 8218 : 280 - 297
[9] Optimizing data stream processing for large-scale applications
Cappellari, Paolo
Roantree, Mark
Chun, Soon Ae
[J]. SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09): : 1607 - 1641
[10] Stream Processing with BigData: SSS-MapReduce
Nakada, Hidemoto
Ogawa, Hirotaka
Kudoh, Tomohiro
[J]. 2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2012,

← 1 2 3 4 5 →