Data Processing Performance of Apache Spark on Beowulf Clusters: An Overview

被引:0
|
作者
Cluci, Marius-Iulian [1 ]
Fotache, Mann [1 ]
Greavu-Serban, Valerica [1 ]
机构
[1] Alexandru Ioan Cuza Univ, Iasi, Romania
关键词
Big Data; Apache Spark; Beowulf clusters; data processing performance; BIG DATA ANALYTICS;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
Despite the advent of cloud computing and the democratisation of data processing through parallel and distributed computing, Big Data systems may incur considerable costs that make them inaccessible to small and medium sized companies, and also to organizations that are financially stretched (such is the case with many universities). This paper presents preliminary results of data processing tasks (queries) for the Apache Spark framework deployed on a commodity Beowulf cluster. Association between query duration (the outcome) and some predictors, such as the cluster number of nodes, the cluster manager, cluster available RAM, database size, was examined.
引用
收藏
页码:12929 / 12938
页数:10
相关论文
共 50 条
  • [21] Highest Order Voronoi Processing on Apache Spark
    Pradnyana, Putu Eka Budi
    Adhinugraha, Kiki Maulana
    Alamri, Sultan
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2018, PT I, 2018, 10960 : 169 - 182
  • [22] FITS Data Source for Apache Spark
    Peloton J.
    Arnault C.
    Plaszczynski S.
    [J]. Computing and Software for Big Science, 2018, 2 (1)
  • [23] Big data analytics on Apache Spark
    Salloum S.
    Dautov R.
    Chen X.
    Peng P.X.
    Huang J.Z.
    [J]. International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
  • [24] Performance Prediction for Apache Spark Platform
    Wang, Kewen
    Khan, Mohammad Maifi Hasan
    [J]. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 166 - 173
  • [25] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    N. Ahmed
    Andre L. C. Barczak
    Teo Susnjak
    Mohammed A. Rashid
    [J]. Journal of Big Data, 7
  • [26] Efficient Performance Prediction for Apache Spark
    Cheng, Guoli
    Ying, Shi
    Wang, Bingming
    Li, Yuhang
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 149 : 40 - 51
  • [27] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    Ahmed, N.
    Barczak, Andre L. C.
    Susnjak, Teo
    Rashid, Mohammed A.
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [28] Learning-Based Dynamic Memory Allocation Schemes for Apache Spark Data Processing
    Jia, Danlin
    Wang, Li
    Valencia, Natalia
    Bhimani, Janki
    Sheng, Bo
    Mi, Ningfang
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (01) : 13 - 25
  • [29] Real-time Processing of IoT Events with Historic data using Apache Kafka and Apache Spark with Dashing framework
    D'silva, Godson Michael
    Khan, Azharuddin
    Joshi, Gaurav
    SiddheshBari
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 1804 - 1809
  • [30] Using Apache Spark to Collect Analytic from the Streaming Data Processing Application Logs
    Evgenyevich, Golovanov Mikhail
    Valerievich, Bakulev Aleksandr
    Alekseevna, Bakuleva Marina
    [J]. 2018 7TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2018, : 238 - 241