Static and Dynamic Big Data Partitioning on Apache Spark

被引:7
|
作者
Bertolucci, Massimiliano [2 ]
Carlini, Emanuele [1 ]
Dazzi, Patrizio [1 ]
Lulli, Alessandro [1 ,2 ]
Ricci, Laura [1 ,2 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, Pisa, Italy
[2] Univ Pisa, Dept Comp Sci, Pisa, Italy
关键词
BigData; Graph algorithms; Data partitioning; Apache Spark;
D O I
10.3233/978-1-61499-621-7-489
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many of today's large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.
引用
收藏
页码:489 / 498
页数:10
相关论文
共 50 条
  • [1] Big data analytics on Apache Spark
    Salloum S.
    Dautov R.
    Chen X.
    Peng P.X.
    Huang J.Z.
    International Journal of Data Science and Analytics, 2016, 1 (3-4) : 145 - 164
  • [2] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [3] Big Data Software Analytics with Apache Spark
    Gousios, Georgios
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - COMPANION (ICSE-COMPANION, 2018, : 542 - 543
  • [4] Apache Spark: A Big Data Processing Engine
    Shaikh, Eman
    Mohiuddin, Iman
    Alufaisan, Yasmeen
    Nahvi, Irum
    2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
  • [5] Linked Data Partitioning for RDF Processing on Apache Spark
    Atashkar, Amir Hossein
    Ghadiri, Nasser
    Joodaki, Mehdi
    2017 3RD INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2017, : 73 - 77
  • [6] Cost-efficient dynamic scheduling of big data applications in apache spark on cloud
    Islam, Muhammed Tawfiqul
    Srirama, Satish Narayana
    Karunasekera, Shanika
    Buyya, Rajkumar
    JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 162
  • [7] A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark
    Hosseini, Behrooz
    Kiani, Kourosh
    SYMMETRY-BASEL, 2018, 10 (08):
  • [8] Big Data in metagenomics: Apache Spark vs MPI
    Abuin, Jose M.
    Lopes, Nuno
    Ferreira, Luis
    Pena, Tomas F.
    Schmidt, Bertil
    PLOS ONE, 2020, 15 (10):
  • [9] Scalable Manifold Learning for Big Data with Apache Spark
    Schoeneman, Frank
    Zola, Jaroslaw
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 272 - 281
  • [10] Accelerating Apache Spark Big Data Analysis with FPGAs
    Ghasemi, Ehsan
    Chow, Paul
    2016 IEEE 24TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2016, : 94 - 94