Computational Performance Analysis of Cluster-based Technologies for Big Data Analytics

被引：2

作者：

Khan, Mukhtakj ^{[1
]}

Salman ^{[1
]}

Iqbal, Nadeem ^{[1
]}

机构：

[1] Abdul Wali Khan Univ Mardan, Mardan, Pakistan

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA) | 2017年

关键词：

Big Data; Apache Hadoop; Apache Spark; Distributed Computing; Performance;

D O I：

10.1109/iThings-GreenCom-CPSCom-SmartData.2017.239

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to rapid development in Internet, applications and communication technology a huge volume of unstructured data is generated from various sources such as social media, sensor networks, online services, healthcare devices, bioinformatics, computational biology and many more sources. However, the huge volume of data is facing numerous challenges in term of storage and timely processing. Distributed computing platform such as Hadoop MapReduce and Spark is becoming major programming models for data intensive applications. In this paper we compare the performance of both Hadoop MapReduce and Spark programming models in term of computation efficiency. For the purpose of comparison of both the programming models, we employ three applications such as WordCount, Sort and PageRank with varied size of input datasets. The experimental results show that Spark outperforms Hadoop MapReduce in all cases.

引用

页码：280 / 286

页数：7

共 50 条

[1] Cluster-based data filtering for manufacturing big data systems
Li, Yifu
Deng, Xinwei
Ba, Shan
Myers, William R.
Brenneman, William A.
Lange, Steve J.
Zink, Ron
Jin, Ran
[J]. JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (03) : 290 - 302
[2] Visual Analytics Toolkit for Cluster-Based Classification of Mobility Data
Andrienko, Gennady
Andrienko, Natalia
Rinzivillo, Salvatore
Nanni, Mirco
Pedreschi, Dino
[J]. ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2009, 5644 : 432 - +
[3] Cluster-based analysis of FMRI data
Heller, Ruth
Stanley, Damian
Yekutieli, Daniel
Rubin, Nava
Benjamini, Yoav
[J]. NEUROIMAGE, 2006, 33 (02) : 599 - 608
[4] PERFORMANCE ANALYSIS OF CLUSTER-BASED MULTIPROCESSORS
MOHAPATRA, P
DAS, CR
FENG, TY
[J]. IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (01) : 109 - 114
[5] Cluster-Based Join for Geographically Distributed Big RDF Data
Yang, Fan
Crainiceanu, Adina
Chen, Zhiyuan
Needham, Don
[J]. 2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 170 - 178
[6] Technologies of Predictive Analytics for Big Data
Dorogov, A. Yu.
[J]. 2015 XVIII International Conference on Soft Computing and Measurements (SCM), 2015, : 182 - 183
[7] Big data: Evaluation criteria for big data analytics technologies
Muchemwa, Regis
de la Harpe, Andre
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BUSINESS AND MANAGEMENT DYNAMICS 2016: SUSTAINABLE ECONOMIES IN THE INFORMATION ECONOMY, 2016, : 80 - 86
[8] A comparative study of cluster-based Big Data Cube implementations
Morielo Caetano, Andre Francisco
Hirata, Celso Massaki
Silva, Rodrigo Rocha
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 133 : 240 - 253
[9] ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics
Khan, Muhib
Yu, Weikuan
[J]. 50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
[10] Intelligent technologies and applications for big data analytics
You, Ilsun
Ogiela, Marek R.
Hwang, Myunggwon
[J]. SOFTWARE-PRACTICE & EXPERIENCE, 2015, 45 (08): : 1019 - 1021

← 1 2 3 4 5 →