Parallel Data Processing with MapReduce: A Survey

被引:310
|
作者
Lee, Kyong-Ha [1 ]
Lee, Yoon-Joon [1 ]
Choi, Hyunsik [2 ]
Chung, Yon Dohn [2 ]
Moon, Bongki [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul, South Korea
[3] Univ Arizona, Dept Comp Sci, Tucson, AZ 85721 USA
关键词
MAP-REDUCE; PERFORMANCE; MANAGEMENT; TOP;
D O I
10.1145/2094114.2094118
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A prominent parallel data processing tool MapReduce is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. While MapReduce is used in many areas where massive data analysis is required, there are still debates on its performance, efficiency per node, and simple abstraction. This survey intends to assist the database and open source communities in understanding various technical aspects of the MapReduce framework. In this survey, we characterize the MapReduce framework and discuss its inherent pros and cons. We then introduce its optimization strategies reported in the recent literature. We also discuss the open issues and challenges raised on parallel data analysis with MapReduce.
引用
收藏
页码:11 / 20
页数:10
相关论文
共 50 条
  • [1] A Survey of MapReduce based Parallel Processing Technologies
    Lu Jiamin
    Feng Jun
    [J]. CHINA COMMUNICATIONS, 2014, 11 (02) : 146 - 155
  • [2] Parallel Processing of Massive EEG Data with MapReduce
    Wang, Lizhe
    Chen, Dan
    Ranjan, Rajiv
    Khan, Samee U.
    Kolodziej, Joanna
    Wang, Jun
    [J]. PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 164 - 171
  • [3] Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce
    Tang, Bing
    He, Haiwu
    Fedak, Gilles
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 1 - 14
  • [4] Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing
    Aly, Mohab
    Yacout, Soumaya
    Shaban, Yasser
    [J]. 2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2017,
  • [5] A Survey on Geographically Distributed Big-Data Processing Using MapReduce
    Dolev, Shlomi
    Florissi, Patricia
    Gudes, Ehud
    Sharma, Shantanu
    Singer, Ido
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (01) : 60 - 80
  • [6] Parallel Processing Systems for Big Data: A Survey
    Zhang, Yunquan
    Cao, Ting
    Li, Shigang
    Tian, Xinhui
    Yuan, Liang
    Jia, Haipeng
    Vasilakos, Athanasios V.
    [J]. PROCEEDINGS OF THE IEEE, 2016, 104 (11) : 2114 - 2136
  • [7] Parallel Processing of Big Data using Power Iteration Clustering over MapReduce
    Jayalatchumy, D.
    Thambidurai, P.
    Alamelu, A. Vasumathi
    [J]. 2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 176 - 178
  • [8] Spatial Data Processing with MapReduce
    Gunawardena, Tilani
    Vicari, Annamaria
    Mecca, Giansalvatore
    [J]. 2015 IEEE 10TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2015, : 485 - 490
  • [9] Simplifying MapReduce data processing
    Liao, Chih-Shan
    Shih, Jin-Ming
    Chang, Ruay-Shiung
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2013, 8 (03) : 219 - 226
  • [10] P2P-MapReduce: Parallel data processing in dynamic Cloud environments
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (05) : 1382 - 1402