Fast execution of RDF queries using Apache Hadoop

被引：0

作者：

Mazumdar, Somnath ^{[1
]}

Scionti, Alberto ^{[2
]}

机构：

[1] Univ Siena, Dept Informat Engn & Math, Siena, Italy

[2] Ist Super Mario Boella ISMB, Turin, Italy

来源：

ADVANCES IN COMPUTERS, VOL 119 | 2020年 / 119卷

关键词：

SPARQL; ENGINE;

D O I：

10.1016/bs.adcom.2020.03.001

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Map-Reduce (MR) is a distributed programming framework which became very popular since its introduction, due to its ability to process massive data sets. MR provides a robust and straightforward mechanism to implement distributed applications without worrying much about manymanagement aspects of parallel programming (e.g., instantiating jobs, data distribution, job synchronization). On the other hand, the Resource Description Framework (RDF) with its simplicity and flexibility, can represent semistructured and unstructured data which are very important for representing web-semantics. SPARQL is a query language aimed at retrieving and manipulating data stored in RDF format and also supports "Big Data" applications. In this book chapter, we present a framework designed to evaluate complex SPARQL queries fast. To improve the execution of SPARQL queries, we implemented the query engine on the Hadoop framework. The engine can handle large and complex queries involving multiple join variables while running on large RDF data sets. Further execution speedup is obtained by preprocessing the input datawith parallel Bloomfilters. The query engine has been tested on the SP2 benchmark, and the results demonstrate the benefits of the design. In this case, the minimum query improvement is 5% while the maximum improvement has been achieved is 82%.

引用

页码：1 / 33

页数：33

共 50 条

[1] Execution of Recursive Queries in Apache Spark
Katsogridakis, Pavlos
Papagiannaki, Sofia
Pratikakis, Polyvios
[J]. EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 : 289 - 302
[2] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
Garg, Varun
[J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
[3] Processing RDF Using Hadoop
Ali, Mehreen
Bharat, K. Sriram
Ranichandra, C.
[J]. ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 2, 2013, 177 : 385 - 394
[4] Comparative Analysis of Apache Spark and Hadoop MapReduce Using Various Parameters and Execution Time
Meena, Bhagavathula
Sarwani, I. S. L.
Archana, M.
Supriya, P.
[J]. INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 719 - 725
[5] Native Execution of GraphQL Queries over RDF Graphs Using Multi-Way Joins
Karalis, Nikolaos
Bigerl, Alexander
Ngonga Ngomo, Axel-Cyrille
[J]. KNOWLEDGE GRAPHS: SEMANTICS, MACHINE LEARNING, AND LANGUAGES, 2023, 56 : 77 - 93
[6] RIQ: Fast processing of SPARQL queries on RDF quadruples
Katib, Anas
Slavov, Vasil
Rao, Praveen
[J]. JOURNAL OF WEB SEMANTICS, 2016, 37-38 : 90 - 111
[7] Fast Processing SPARQL Queries on Large RDF Data
Yang, Guang
Yuan, Pingpeng
Jin, Hai
[J]. 2016 IEEE 14TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 14TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 2ND INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/DATACOM/CYBERSC, 2016, : 921 - 926
[8] Fast and Concurrent RDF Queries using RDMA-assisted GPU Graph Exploration
Wang, Siyuan
Lou, Chang
Chen, Rong
Chen, Haibo
[J]. PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, 2018, : 651 - 664
[9] AN APPROACH FOR FAST AND PARALLEL VIDEO PROCESSING ON APACHE HADOOP CLUSTERS
Tan, Hanlin
Chen, Lidong
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
[10] Analysis of Apache Logs Using Hadoop and Hive
Velinov, Aleksandar
Zdravev, Zoran
[J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (03): : 645 - 650

← 1 2 3 4 5 →