Static and Dynamic Big Data Partitioning on Apache Spark

被引:7
|
作者
Bertolucci, Massimiliano [2 ]
Carlini, Emanuele [1 ]
Dazzi, Patrizio [1 ]
Lulli, Alessandro [1 ,2 ]
Ricci, Laura [1 ,2 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, Pisa, Italy
[2] Univ Pisa, Dept Comp Sci, Pisa, Italy
关键词
BigData; Graph algorithms; Data partitioning; Apache Spark;
D O I
10.3233/978-1-61499-621-7-489
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many of today's large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.
引用
收藏
页码:489 / 498
页数:10
相关论文
共 50 条
  • [31] A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark
    Ramirez-Gallego, S.
    Garcia, S.
    Benitez, J. M.
    Herrera, F.
    SWARM AND EVOLUTIONARY COMPUTATION, 2018, 38 : 240 - 250
  • [32] Big data classification using deep learning and apache spark architecture
    Anilkumar V. Brahmane
    B. Chaitanya Krishna
    Neural Computing and Applications, 2021, 33 : 15253 - 15266
  • [33] Big Data Platform for Oil and Gas Production Based on Apache Spark
    Qing, Peng
    Li, Yi
    Luo, Shuqin
    Xu, Zhuoqun
    MODERN INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC 2020, 2021, 218 : 129 - 141
  • [34] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [35] A Big Data Analysis Framework Using Apache Spark and Deep Learning
    Gupta, Anand
    Thakur, Hardeo Kumar
    Shrivastava, Ritvik
    Kumar, Pulkit
    Nag, Sreyashi
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 9 - 16
  • [36] Big data classification using deep learning and apache spark architecture
    Brahmane, Anilkumar, V
    Krishna, B. Chaitanya
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22): : 15253 - 15266
  • [37] Applying Apache Spark on Streaming Big Data for Health Status Prediction
    Ebada, Ahmed Ismail
    Elhenawy, Ibrahim
    Jeong, Chang-Won
    Nam, Yunyoung
    Elbakry, Hazem
    Abdelrazek, Samir
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 3511 - 3527
  • [38] Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm
    Boachie, Emmanuel
    Li, Chunlin
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (1-2) : 5 - 20
  • [39] Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark
    Hassan, Mahmudul
    Bansal, Srividya K.
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 24 - 31
  • [40] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
    Mogha, Garima
    Ahlawat, Khyati
    Singh, Amit Prakash
    DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26