Data Processing Performance of Apache Spark on Beowulf Clusters: An Overview

被引：0

作者：

Cluci, Marius-Iulian ^{[1
]}

Fotache, Mann ^{[1
]}

Greavu-Serban, Valerica ^{[1
]}

机构：

[1] Alexandru Ioan Cuza Univ, Iasi, Romania

来源：

VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE | 2019年

关键词：

Big Data; Apache Spark; Beowulf clusters; data processing performance; BIG DATA ANALYTICS;

D O I：

暂无

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

Despite the advent of cloud computing and the democratisation of data processing through parallel and distributed computing, Big Data systems may incur considerable costs that make them inaccessible to small and medium sized companies, and also to organizations that are financially stretched (such is the case with many universities). This paper presents preliminary results of data processing tasks (queries) for the Apache Spark framework deployed on a commodity Beowulf cluster. Association between query duration (the outcome) and some predictors, such as the cluster number of nodes, the cluster manager, cluster available RAM, database size, was examined.

引用

页码：12929 / 12938

页数：10

共 50 条

[21] Highest Order Voronoi Processing on Apache Spark
Pradnyana, Putu Eka Budi
Adhinugraha, Kiki Maulana
Alamri, Sultan
[J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2018, PT I, 2018, 10960 : 169 - 182
[22] FITS Data Source for Apache Spark
Peloton J.
Arnault C.
Plaszczynski S.
[J]. Computing and Software for Big Science, 2018, 2 (1)
[23] Big data analytics on Apache Spark
Salloum S.
Dautov R.
Chen X.
Peng P.X.
Huang J.Z.
[J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
[24] Performance Prediction for Apache Spark Platform
Wang, Kewen
Khan, Mohammad Maifi Hasan
[J]. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 166 - 173
[25] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
N. Ahmed
Andre L. C. Barczak
Teo Susnjak
Mohammed A. Rashid
[J]. Journal of Big Data, 7
[26] Efficient Performance Prediction for Apache Spark
Cheng, Guoli
Ying, Shi
Wang, Bingming
Li, Yuhang
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 149 : 40 - 51
[27] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
Ahmed, N.
Barczak, Andre L. C.
Susnjak, Teo
Rashid, Mohammed A.
[J]. JOURNAL OF BIG DATA, 2020, 7 (01)
[28] Learning-Based Dynamic Memory Allocation Schemes for Apache Spark Data Processing
Jia, Danlin
Wang, Li
Valencia, Natalia
Bhimani, Janki
Sheng, Bo
Mi, Ningfang
[J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (01) : 13 - 25
[29] Real-time Processing of IoT Events with Historic data using Apache Kafka and Apache Spark with Dashing framework
D'silva, Godson Michael
Khan, Azharuddin
Joshi, Gaurav
SiddheshBari
[J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 1804 - 1809
[30] Using Apache Spark to Collect Analytic from the Streaming Data Processing Application Logs
Evgenyevich, Golovanov Mikhail
Valerievich, Bakulev Aleksandr
Alekseevna, Bakuleva Marina
[J]. 2018 7TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2018, : 238 - 241

← 1 2 3 4 5 →