Static and Dynamic Big Data Partitioning on Apache Spark

被引:7
|
作者
Bertolucci, Massimiliano [2 ]
Carlini, Emanuele [1 ]
Dazzi, Patrizio [1 ]
Lulli, Alessandro [1 ,2 ]
Ricci, Laura [1 ,2 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, Pisa, Italy
[2] Univ Pisa, Dept Comp Sci, Pisa, Italy
关键词
BigData; Graph algorithms; Data partitioning; Apache Spark;
D O I
10.3233/978-1-61499-621-7-489
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many of today's large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.
引用
收藏
页码:489 / 498
页数:10
相关论文
共 50 条
  • [41] Predictors of outpatients' no-show: big data analytics using apache spark
    Daghistani, Tahani
    AlGhamdi, Huda
    Alshammari, Riyad
    AlHazme, Raed H.
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [42] A Big Data Framework for Intrusion Detection in Smart Grids Using Apache Spark
    Vimalkumar, K.
    Radhika, N.
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 198 - 204
  • [43] Concept and benchmark results for Big Data energy forecasting based on Apache Spark
    González Ordiano J.Á.
    Bartschat A.
    Ludwig N.
    Braun E.
    Waczowicz S.
    Renkamp N.
    Peter N.
    Düpmeier C.
    Mikut R.
    Hagenmeyer V.
    Journal of Big Data, 5 (1)
  • [44] Big Data Application in Functional Magnetic Resonance Imaging using Apache Spark
    Sarraf, Saman
    Ostadhashem, Mehdi
    PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 281 - 284
  • [45] Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark
    Triguero, I.
    Galar, M.
    Merino, D.
    Maillo, J.
    Bustince, H.
    Herrera, F.
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 640 - 647
  • [46] SIDELOADING - INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE
    Boehm, J.
    Liu, K.
    Alis, C.
    XXIII ISPRS CONGRESS, COMMISSION II, 2016, 41 (B2): : 343 - 348
  • [47] Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark
    Bharill, Neha
    Tiwari, Aruna
    Malviya, Aayushi
    PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 95 - 104
  • [48] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
    Benbernou, Salima
    Ouziri, Mourad
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
  • [49] Low Cost Big Data Solutions: The Case of Apache Spark on Beowulf Clusters
    Fotache, Marin
    Cluci, Marius-Iulian
    Greavu-Serban, Valerica
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 327 - 334
  • [50] Exhaustive search algorithms to mine subgroups on Big Data using Apache Spark
    Padillo F.
    Luna J.M.
    Ventura S.
    Progress in Artificial Intelligence, 2017, 6 (2) : 145 - 158