Data Processing Performance of Apache Spark on Beowulf Clusters: An Overview

被引：0

作者：

Cluci, Marius-Iulian ^{[1
]}

Fotache, Mann ^{[1
]}

Greavu-Serban, Valerica ^{[1
]}

机构：

[1] Alexandru Ioan Cuza Univ, Iasi, Romania

来源：

VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE | 2019年

关键词：

Big Data; Apache Spark; Beowulf clusters; data processing performance; BIG DATA ANALYTICS;

D O I：

暂无

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

Despite the advent of cloud computing and the democratisation of data processing through parallel and distributed computing, Big Data systems may incur considerable costs that make them inaccessible to small and medium sized companies, and also to organizations that are financially stretched (such is the case with many universities). This paper presents preliminary results of data processing tasks (queries) for the Apache Spark framework deployed on a commodity Beowulf cluster. Association between query duration (the outcome) and some predictors, such as the cluster number of nodes, the cluster manager, cluster available RAM, database size, was examined.

引用

页码：12929 / 12938

页数：10

共 50 条

[1] Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters
Fotache, Marin
Cluci, Marius-Iulian
Greavu-Serban, Valerica
[J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 327 - 334
[2] Big Spatial Data Processing With Apache Spark
Boyi Shangguan
Peng Yue
Wu, Zhaoyan
Jiang, Liangcun
[J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
[3] Apache Spark: A Big Data Processing Engine
Shaikh, Eman
Mohiuddin, Iman
Alufaisan, Yasmeen
Nahvi, Irum
[J]. 2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
[4] Identifying the potential of Near Data Processing for Apache Spark
Awan, Ahsan Javed
Ohara, Moriyoshi
Ayguade, Eduard
Ishizaki, Kazuaki
Brorsson, Mats
Vlassov, Vladimir
[J]. MEMSYS 2017: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2017, : 60 - 67
[5] Processing large-scale data with Apache Spark
Ko, Seyoon
Won, Joong-Ho
[J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1077 - 1094
[6] Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark
Korkhov, Vladimir
Gankevich, Ivan
Iakushkin, Oleg
Gushchanskiy, Dmitry
Khmel, Dmitry
Ivashchenko, Andrey
Pyayt, Alexander
Zobnin, Sergey
Loginov, Alexander
[J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2017, PT V, 2017, 10408 : 387 - 398
[7] Linked Data Partitioning for RDF Processing on Apache Spark
Atashkar, Amir Hossein
Ghadiri, Nasser
Joodaki, Mehdi
[J]. 2017 3RD INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2017, : 73 - 77
[8] Apache Spark: A Unified Engine for Big Data Processing
Zaharia, Matei
Xin, Reynold S.
Wendell, Patrick
Das, Tathagata
Armbrust, Michael
Dave, Ankur
Meng, Xiangrui
Rosen, Josh
Venkataraman, Shivaram
Franklin, Michael J.
Ghodsi, Ali
Gonzalez, Joseph
Shenker, Scott
Stoica, Ion
[J]. COMMUNICATIONS OF THE ACM, 2016, 59 (11) : 56 - 65
[9] Consideration of Parallel Data Processing over an Apache Spark Cluster
Kato, Kasumi
Takefusa, Atsuko
Nakada, Hidemoto
Oguchi, Masato
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4757 - 4759
[10] Big Data Network Flow Processing Using Apache Spark
Jerabek, Kamil
Rysavy, Ondrej
[J]. PROCEEDINGS OF THE 6TH CONFERENCE ON THE ENGINEERING OF COMPUTER BASED SYSTEMS (ECBS 2019), 2020,

← 1 2 3 4 5 →