Static and Dynamic Big Data Partitioning on Apache Spark

被引：7

作者：

Bertolucci, Massimiliano ^{[2
]}

Carlini, Emanuele ^{[1
]}

Dazzi, Patrizio ^{[1
]}

Lulli, Alessandro ^{[1
,2
]}

Ricci, Laura ^{[1
,2
]}

机构：

[1] CNR, Ist Sci & Tecnol Informaz, Pisa, Italy

[2] Univ Pisa, Dept Comp Sci, Pisa, Italy

来源：

PARALLEL COMPUTING: ON THE ROAD TO EXASCALE | 2016年 / 27卷

关键词：

BigData; Graph algorithms; Data partitioning; Apache Spark;

D O I：

10.3233/978-1-61499-621-7-489

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many of today's large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.

引用

页码：489 / 498

页数：10

共 50 条

[31] A distributed evolutionary multivariate discretizer for Big Data processing on Apache Spark
Ramirez-Gallego, S.
Garcia, S.
Benitez, J. M.
Herrera, F.
SWARM AND EVOLUTIONARY COMPUTATION, 2018, 38 : 240 - 250
[32] Big data classification using deep learning and apache spark architecture
Anilkumar V. Brahmane
B. Chaitanya Krishna
Neural Computing and Applications, 2021, 33 : 15253 - 15266
[33] Big Data Platform for Oil and Gas Production Based on Apache Spark
Qing, Peng
Li, Yi
Luo, Shuqin
Xu, Zhuoqun
MODERN INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC 2020, 2021, 218 : 129 - 141
[34] Big data Predictive Analytics for Apache Spark using Machine Learning
Junaid, Muhammad
Wagan, Shiraz Ali
Qureshi, Nawab Muhammad Faseeh
Nam, Choon Sung
Shin, Dong Ryeol
2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
[35] A Big Data Analysis Framework Using Apache Spark and Deep Learning
Gupta, Anand
Thakur, Hardeo Kumar
Shrivastava, Ritvik
Kumar, Pulkit
Nag, Sreyashi
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 9 - 16
[36] Big data classification using deep learning and apache spark architecture
Brahmane, Anilkumar, V
Krishna, B. Chaitanya
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (22): : 15253 - 15266
[37] Applying Apache Spark on Streaming Big Data for Health Status Prediction
Ebada, Ahmed Ismail
Elhenawy, Ibrahim
Jeong, Chang-Won
Nam, Yunyoung
Elbakry, Hazem
Abdelrazek, Samir
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 3511 - 3527
[38] Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm
Boachie, Emmanuel
Li, Chunlin
INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (1-2) : 5 - 20
[39] Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark
Hassan, Mahmudul
Bansal, Srividya K.
2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 24 - 31
[40] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
Mogha, Garima
Ahlawat, Khyati
Singh, Amit Prakash
DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26

← 1 2 3 4 5 →