Optimizing Cloud MapReduce for Processing Stream Data using Pipelining

被引:5
|
作者
Karve, Rutvik [1 ]
Dahiphale, Devendra [1 ]
Chhajer, Amit [1 ]
机构
[1] Pune Inst Comp Technol, Dept Comp Engn, Pune, Maharashtra, India
关键词
MapReduce; Cloud Computing; Pipelining; Stream Processing; Distributed Computing;
D O I
10.1109/EMS.2011.76
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cloud MapReduce (CMR) is a framework for processing large data sets of batch data in cloud. The Map and Reduce phases run sequentially, one after another. This leads to: 1. Compulsory batch processing 2. No parallelization of the map and reduce phases 3. Increased delays. The current implementation is not suited for processing streaming data. We propose a novel architecture to support streaming data as input using pipelining between the Map and Reduce phases in CMR, ensuring that the output of the Map phase is made available to the Reduce phase as soon as it is produced. This 'Pipelined MapReduce' approach leads to increased parallelism between the Map and Reduce phases; thereby 1. Supporting streaming data as input 2. Reducing delays 3. Enabling the user to take ' snapshots' of the approximate output generated in a stipulated time frame. 4. Supporting cascaded MapReduce jobs. This cloud implementation is light-weight and inherently scalable.
引用
收藏
页码:344 / 349
页数:6
相关论文
共 50 条
  • [11] Optimizing VM Provisioning of MapReduce Tasks on Public Cloud
    Kaur, Banpreet
    Grover, Ankit
    [J]. INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY & COMPUTING, 2016, 2016,
  • [12] Curracurrong Cloud: Stream Processing in the Cloud
    Kakkad, Vasvi
    Dey, Akon
    Fekete, Alan
    Scholz, Bernhard
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2014, : 207 - 214
  • [13] Handling Big Data Using MapReduce Over Hybrid Cloud
    Saxena, Ankur
    Chaurasia, Ankur
    Kaushik, Neeraj
    Kaushik, Nidhi
    [J]. INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 135 - 144
  • [14] Service Management in the Edge Cloud for Stream Processing of IoT Data
    Moussa, Hachem
    Yen, I-Ling
    Bastani, Farokh
    [J]. 2020 IEEE 13TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2020), 2020, : 91 - 98
  • [15] Spatial Data Processing with MapReduce
    Gunawardena, Tilani
    Vicari, Annamaria
    Mecca, Giansalvatore
    [J]. 2015 IEEE 10TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2015, : 485 - 490
  • [16] An improved partitioning mechanism for optimizing massive data analysis using MapReduce
    Slagter, Kenn
    Hsu, Ching-Hsien
    Chung, Yeh-Ching
    Zhang, Daqiang
    [J]. JOURNAL OF SUPERCOMPUTING, 2013, 66 (01): : 539 - 555
  • [17] An improved partitioning mechanism for optimizing massive data analysis using MapReduce
    Kenn Slagter
    Ching-Hsien Hsu
    Yeh-Ching Chung
    Daqiang Zhang
    [J]. The Journal of Supercomputing, 2013, 66 : 539 - 555
  • [18] Automating Platform Selection for MapReduce Processing in the Cloud
    Zhang, Zhuoyao
    Cherkasova, Ludmila
    Boon Thau Loo
    [J]. 2015 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC), 2015, : 125 - 136
  • [19] Simplifying MapReduce data processing
    Liao, Chih-Shan
    Shih, Jin-Ming
    Chang, Ruay-Shiung
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2013, 8 (03) : 219 - 226
  • [20] AJIRA: a Lightweight Distributed Middleware for MapReduce and Stream Processing
    Urbani, Jacopo
    Margara, Alessandro
    Jacobs, Ceriel
    Voulgaris, Spyros
    Bal, Henri
    [J]. 2014 IEEE 34TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2014), 2014, : 545 - 554