Static and Dynamic Big Data Partitioning on Apache Spark

被引:7
|
作者
Bertolucci, Massimiliano [2 ]
Carlini, Emanuele [1 ]
Dazzi, Patrizio [1 ]
Lulli, Alessandro [1 ,2 ]
Ricci, Laura [1 ,2 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, Pisa, Italy
[2] Univ Pisa, Dept Comp Sci, Pisa, Italy
关键词
BigData; Graph algorithms; Data partitioning; Apache Spark;
D O I
10.3233/978-1-61499-621-7-489
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many of today's large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.
引用
收藏
页码:489 / 498
页数:10
相关论文
共 50 条
  • [21] SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark
    Voicu, Tudor Alexandru
    Al-Ars, Zaid
    2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 152 - 157
  • [22] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [23] BigDebug: Interactive Debugger for Big Data Analytics in Apache Spark
    Gulzar, Muhammad Ali
    Interlandi, Matteo
    Condie, Tyson
    Kim, Miryung
    FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 1033 - 1037
  • [24] Balanced Graph Partitioning with Apache Spark
    Carlini, Emanuele
    Dazzi, Patrizio
    Esposito, Andrea
    Lulli, Alessandro
    Ricci, Laura
    EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT I, 2014, 8805 : 129 - 140
  • [25] Developing Big Data anomaly dynamic and static detection algorithms: AnomalyDSD spark package
    Garcia-Gil, Diego
    Lopez, David
    Arguelles-Martino, Daniel
    Carrasco, Jacinto
    Aguilera-Martos, Ignacio
    Luengo, Julian
    Herrera, Francisco
    INFORMATION SCIENCES, 2025, 690
  • [26] A Survey of Scheduling Tasks in Big Data: Apache Spark<bold> </bold>
    Hasan, Balqees Talal
    Abdullah, Dhuha Basheer
    MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING, ICMETE 2021, 2022, 373 : 405 - 414
  • [27] Approx-SMOTE: Fast SMOTE for Big Data on Apache Spark
    Juez-Gil, Mario
    Arnaiz-Gonzalez, Alvar
    Rodriguez, Juan J.
    Lopez-Nozal, Carlos
    Garcia-Osorio, Cesar
    NEUROCOMPUTING, 2021, 464 : 432 - 437
  • [28] Testing of algorithms for anomaly detection in Big data using apache spark
    Lighari, Sheeraz Niaz
    Hussain, Dil Muhammad Akbar
    2017 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2017, : 97 - 100
  • [29] PRISPARK: Differential Privacy Enforcement for Big Data Computing in Apache Spark
    Li, Shuailou
    Wen, Yu
    Xue, Tao
    Wang, Zhaoyang
    Wu, Yanna
    Meng, Dan
    2023 42ND INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, SRDS 2023, 2023, : 93 - 106
  • [30] Mobile Big Data Analytics Using Deep Learning and Apache Spark
    Abu Alsheikh, Mohammad
    Niyato, Dusit
    Lin, Shaowei
    Tan, Hwee-Pink
    Han, Zhu
    IEEE NETWORK, 2016, 30 (03): : 22 - 29