Performance Evaluation of Big Data Frameworks: MapReduce and Spark

被引:0
|
作者
Singh, Jaspreet [1 ]
Panda, S. N. [1 ]
Kaushal, Rajesh [1 ]
机构
[1] Chitkara Univ, Inst Engn & Technol, Dept Comp Sci & Engn, Rajpura, Punjab, India
关键词
Hadoop; Spark; MapReduce; HDFS; Data analytics; HADOOP;
D O I
10.1007/978-981-10-5903-2_167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spark and MapReduce are two prominent open-source distributed computing frameworks for big data processing and analytics. These frameworks introduce a simple programming APIs for new users and suppress the complication and fault tolerance of distributed tasks. Most of Internet companies widely deploy these frameworks to process their massive data. Furthermore, all other big communities are adopting these HPC because high-performance data analytics is required to solve big data problems. To provide an efficient framework for processing and analyzing large amount of data, today's researchers correlate both the frameworks. (1) This paper discusses the evaluation of the performance of MapReduce and Spark on page rank, sort and word count. From some existing research, we evaluate page rank and sort algorithms in these frameworks. (2) We provide in-depth analysis of task execution time on word count algorithm in both of these frameworks, through detailed experiment and quantify the performance based on different dataset sizes. Overall experimental results show that Spark is faster than MapReduce. The prime causes of speedups in Spark are the reduced DISK and CPU overheads due to RDD cashing.
引用
收藏
页码:1611 / 1619
页数:9
相关论文
共 50 条
  • [1] Scalability and Realtime on Big Data, MapReduce, NoSQL and Spark
    Furtado, Pedro
    [J]. BUSINESS INTELLIGENCE (EBISS 2016), 2017, 280 : 79 - 104
  • [2] Spark versus Flink: Understanding Performance in Big Data Analytics Frameworks
    Marcu, Ovidiu-Cristian
    Costan, Alexandra
    Antoniu, Gabriel
    Perez-Hernandez, Maria S.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 433 - 442
  • [3] A Study on Big Data Processing Frameworks: Spark and Storm
    Deshai, N.
    Venkataramana, S.
    Sekhar, B. V. D. S.
    Srinivas, K.
    Varma, G. P. Saradhi
    [J]. SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2, 2020, 160 : 415 - 424
  • [4] Big Data Management Processing with Hadoop MapReduce and Spark Technology: A Comparison
    Verma, Ankush
    Mansuri, Ashik Hussain
    Jain, Neelesh
    [J]. 2016 SYMPOSIUM ON COLOSSAL DATA ANALYSIS AND NETWORKING (CDAN), 2016,
  • [5] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
    Ketu, Shwet
    Mishra, Pramod Kumar
    Agarwal, Sonali
    [J]. COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
  • [6] A Comparison of Big Remote Sensing Data Processing with Hadoop MapReduce and Spark
    Chebbi, I.
    Boulila, W.
    Mellouli, N.
    Lamolle, M.
    Farah, I. R.
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2018,
  • [7] Performance Evaluation and Optimization of Join Operation in Spark for Big Data Processing
    Qiu, Deyang
    Zhou, Wenli
    Liu, Jun
    [J]. PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2295 - 2298
  • [8] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
    Veiga, Jorge
    Exposito, Roberto R.
    Pardo, Xoan C.
    Taboada, Guillermo L.
    Tourino, Juan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431
  • [9] Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks
    Fernandez, Alberto
    del Rio, Sara
    Lopez, Victoria
    Bawakid, Abdullah
    del Jesus, Maria J.
    Benitez, Jose M.
    Herrera, Francisco
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (05) : 380 - 409
  • [10] Challenges in High Performance Big Data Frameworks
    Papadopoulos, Alessandro V.
    Maggio, Martina
    [J]. PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 153 - 156