Optimizing Cloud MapReduce for Processing Stream Data using Pipelining

被引:5
|
作者
Karve, Rutvik [1 ]
Dahiphale, Devendra [1 ]
Chhajer, Amit [1 ]
机构
[1] Pune Inst Comp Technol, Dept Comp Engn, Pune, Maharashtra, India
关键词
MapReduce; Cloud Computing; Pipelining; Stream Processing; Distributed Computing;
D O I
10.1109/EMS.2011.76
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cloud MapReduce (CMR) is a framework for processing large data sets of batch data in cloud. The Map and Reduce phases run sequentially, one after another. This leads to: 1. Compulsory batch processing 2. No parallelization of the map and reduce phases 3. Increased delays. The current implementation is not suited for processing streaming data. We propose a novel architecture to support streaming data as input using pipelining between the Map and Reduce phases in CMR, ensuring that the output of the Map phase is made available to the Reduce phase as soon as it is produced. This 'Pipelined MapReduce' approach leads to increased parallelism between the Map and Reduce phases; thereby 1. Supporting streaming data as input 2. Reducing delays 3. Enabling the user to take ' snapshots' of the approximate output generated in a stipulated time frame. 4. Supporting cascaded MapReduce jobs. This cloud implementation is light-weight and inherently scalable.
引用
收藏
页码:344 / 349
页数:6
相关论文
共 50 条
  • [1] Research of a MapReduce Communication Data Stream Processing Model
    Yang, Wenchuan
    Jia, Bei
    [J]. PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 28 - 31
  • [2] Optimizing distributed data stream processing by tracing
    Zvara, Zoltan
    Szabo, Peter G. N.
    Balazs, Barnabas
    Benczur, Andras
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 578 - 591
  • [3] Optimizing Cost and Performance Trade-Offs for MapReduce Job Processing in the Cloud
    Zhang, Zhuoyao
    Cherkasova, Ludmila
    Loo, Boon Thau
    [J]. 2014 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2014,
  • [4] Data stream treatment using sliding windows with MapReduce
    Jose Basgall, Maria
    Hasperue, Waldo
    Naiouf, Marcelo
    [J]. JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2016, 16 (02): : 76 - 83
  • [5] Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads
    Satish, Rama K., V
    Kavya, N. P.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 49 - 54
  • [6] An Optimal Model for Optimizing the Placement and Parallelism of Data Stream Processing Applications on Cloud-Edge Computing
    de Souza, Felipe Rodrigo
    de Assuncao, Marcos Dias
    Caron, Eddy
    Veith, Alexandre da Silva
    [J]. 2020 IEEE 32ND INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2020), 2020, : 59 - 66
  • [7] Smart MapReduce Cloud: Applying Extra Processing to Intermediate Data on Demand
    Huang, Tzu-Chi
    Chu, Kuo-Chih
    Tsai, Ming-Fong
    [J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 799 - 804
  • [8] Elastic and Scalable Processing of Linked Stream Data in the Cloud
    Le-Phuoc, Danh
    Hoan Nguyen Mau Quoc
    Le Van, Chan
    Hauswirth, Manfred
    [J]. SEMANTIC WEB - ISWC 2013, PART I, 2013, 8218 : 280 - 297
  • [9] Optimizing data stream processing for large-scale applications
    Cappellari, Paolo
    Roantree, Mark
    Chun, Soon Ae
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09): : 1607 - 1641
  • [10] Stream Processing with BigData: SSS-MapReduce
    Nakada, Hidemoto
    Ogawa, Hirotaka
    Kudoh, Tomohiro
    [J]. 2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2012,