Computational Performance Analysis of Cluster-based Technologies for Big Data Analytics

被引:2
|
作者
Khan, Mukhtakj [1 ]
Salman [1 ]
Iqbal, Nadeem [1 ]
机构
[1] Abdul Wali Khan Univ Mardan, Mardan, Pakistan
关键词
Big Data; Apache Hadoop; Apache Spark; Distributed Computing; Performance;
D O I
10.1109/iThings-GreenCom-CPSCom-SmartData.2017.239
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to rapid development in Internet, applications and communication technology a huge volume of unstructured data is generated from various sources such as social media, sensor networks, online services, healthcare devices, bioinformatics, computational biology and many more sources. However, the huge volume of data is facing numerous challenges in term of storage and timely processing. Distributed computing platform such as Hadoop MapReduce and Spark is becoming major programming models for data intensive applications. In this paper we compare the performance of both Hadoop MapReduce and Spark programming models in term of computation efficiency. For the purpose of comparison of both the programming models, we employ three applications such as WordCount, Sort and PageRank with varied size of input datasets. The experimental results show that Spark outperforms Hadoop MapReduce in all cases.
引用
收藏
页码:280 / 286
页数:7
相关论文
共 50 条
  • [1] Cluster-based data filtering for manufacturing big data systems
    Li, Yifu
    Deng, Xinwei
    Ba, Shan
    Myers, William R.
    Brenneman, William A.
    Lange, Steve J.
    Zink, Ron
    Jin, Ran
    [J]. JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (03) : 290 - 302
  • [2] Visual Analytics Toolkit for Cluster-Based Classification of Mobility Data
    Andrienko, Gennady
    Andrienko, Natalia
    Rinzivillo, Salvatore
    Nanni, Mirco
    Pedreschi, Dino
    [J]. ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2009, 5644 : 432 - +
  • [3] Cluster-based analysis of FMRI data
    Heller, Ruth
    Stanley, Damian
    Yekutieli, Daniel
    Rubin, Nava
    Benjamini, Yoav
    [J]. NEUROIMAGE, 2006, 33 (02) : 599 - 608
  • [4] PERFORMANCE ANALYSIS OF CLUSTER-BASED MULTIPROCESSORS
    MOHAPATRA, P
    DAS, CR
    FENG, TY
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (01) : 109 - 114
  • [5] Cluster-Based Join for Geographically Distributed Big RDF Data
    Yang, Fan
    Crainiceanu, Adina
    Chen, Zhiyuan
    Needham, Don
    [J]. 2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 170 - 178
  • [6] Technologies of Predictive Analytics for Big Data
    Dorogov, A. Yu.
    [J]. 2015 XVIII International Conference on Soft Computing and Measurements (SCM), 2015, : 182 - 183
  • [7] Big data: Evaluation criteria for big data analytics technologies
    Muchemwa, Regis
    de la Harpe, Andre
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BUSINESS AND MANAGEMENT DYNAMICS 2016: SUSTAINABLE ECONOMIES IN THE INFORMATION ECONOMY, 2016, : 80 - 86
  • [8] A comparative study of cluster-based Big Data Cube implementations
    Morielo Caetano, Andre Francisco
    Hirata, Celso Massaki
    Silva, Rodrigo Rocha
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 133 : 240 - 253
  • [9] ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics
    Khan, Muhib
    Yu, Weikuan
    [J]. 50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [10] Intelligent technologies and applications for big data analytics
    You, Ilsun
    Ogiela, Marek R.
    Hwang, Myunggwon
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2015, 45 (08): : 1019 - 1021