Data Processing Performance of Apache Spark on Beowulf Clusters: An Overview

被引:0
|
作者
Cluci, Marius-Iulian [1 ]
Fotache, Mann [1 ]
Greavu-Serban, Valerica [1 ]
机构
[1] Alexandru Ioan Cuza Univ, Iasi, Romania
关键词
Big Data; Apache Spark; Beowulf clusters; data processing performance; BIG DATA ANALYTICS;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
Despite the advent of cloud computing and the democratisation of data processing through parallel and distributed computing, Big Data systems may incur considerable costs that make them inaccessible to small and medium sized companies, and also to organizations that are financially stretched (such is the case with many universities). This paper presents preliminary results of data processing tasks (queries) for the Apache Spark framework deployed on a commodity Beowulf cluster. Association between query duration (the outcome) and some predictors, such as the cluster number of nodes, the cluster manager, cluster available RAM, database size, was examined.
引用
收藏
页码:12929 / 12938
页数:10
相关论文
共 50 条
  • [1] Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters
    Fotache, Marin
    Cluci, Marius-Iulian
    Greavu-Serban, Valerica
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 327 - 334
  • [2] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [3] Apache Spark: A Big Data Processing Engine
    Shaikh, Eman
    Mohiuddin, Iman
    Alufaisan, Yasmeen
    Nahvi, Irum
    [J]. 2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
  • [4] Identifying the potential of Near Data Processing for Apache Spark
    Awan, Ahsan Javed
    Ohara, Moriyoshi
    Ayguade, Eduard
    Ishizaki, Kazuaki
    Brorsson, Mats
    Vlassov, Vladimir
    [J]. MEMSYS 2017: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2017, : 60 - 67
  • [5] Processing large-scale data with Apache Spark
    Ko, Seyoon
    Won, Joong-Ho
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1077 - 1094
  • [6] Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark
    Korkhov, Vladimir
    Gankevich, Ivan
    Iakushkin, Oleg
    Gushchanskiy, Dmitry
    Khmel, Dmitry
    Ivashchenko, Andrey
    Pyayt, Alexander
    Zobnin, Sergey
    Loginov, Alexander
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2017, PT V, 2017, 10408 : 387 - 398
  • [7] Linked Data Partitioning for RDF Processing on Apache Spark
    Atashkar, Amir Hossein
    Ghadiri, Nasser
    Joodaki, Mehdi
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2017, : 73 - 77
  • [8] Apache Spark: A Unified Engine for Big Data Processing
    Zaharia, Matei
    Xin, Reynold S.
    Wendell, Patrick
    Das, Tathagata
    Armbrust, Michael
    Dave, Ankur
    Meng, Xiangrui
    Rosen, Josh
    Venkataraman, Shivaram
    Franklin, Michael J.
    Ghodsi, Ali
    Gonzalez, Joseph
    Shenker, Scott
    Stoica, Ion
    [J]. COMMUNICATIONS OF THE ACM, 2016, 59 (11) : 56 - 65
  • [9] Consideration of Parallel Data Processing over an Apache Spark Cluster
    Kato, Kasumi
    Takefusa, Atsuko
    Nakada, Hidemoto
    Oguchi, Masato
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4757 - 4759
  • [10] Big Data Network Flow Processing Using Apache Spark
    Jerabek, Kamil
    Rysavy, Ondrej
    [J]. PROCEEDINGS OF THE 6TH CONFERENCE ON THE ENGINEERING OF COMPUTER BASED SYSTEMS (ECBS 2019), 2020,