Fast execution of RDF queries using Apache Hadoop

被引:0
|
作者
Mazumdar, Somnath [1 ]
Scionti, Alberto [2 ]
机构
[1] Univ Siena, Dept Informat Engn & Math, Siena, Italy
[2] Ist Super Mario Boella ISMB, Turin, Italy
来源
关键词
SPARQL; ENGINE;
D O I
10.1016/bs.adcom.2020.03.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Map-Reduce (MR) is a distributed programming framework which became very popular since its introduction, due to its ability to process massive data sets. MR provides a robust and straightforward mechanism to implement distributed applications without worrying much about manymanagement aspects of parallel programming (e.g., instantiating jobs, data distribution, job synchronization). On the other hand, the Resource Description Framework (RDF) with its simplicity and flexibility, can represent semistructured and unstructured data which are very important for representing web-semantics. SPARQL is a query language aimed at retrieving and manipulating data stored in RDF format and also supports "Big Data" applications. In this book chapter, we present a framework designed to evaluate complex SPARQL queries fast. To improve the execution of SPARQL queries, we implemented the query engine on the Hadoop framework. The engine can handle large and complex queries involving multiple join variables while running on large RDF data sets. Further execution speedup is obtained by preprocessing the input datawith parallel Bloomfilters. The query engine has been tested on the SP2 benchmark, and the results demonstrate the benefits of the design. In this case, the minimum query improvement is 5% while the maximum improvement has been achieved is 82%.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 50 条
  • [31] Optimizing Aggregate SPARQL Queries Using Materialized RDF Views
    Ibragimov, Dilshod
    Hose, Katja
    Pedersen, Torben Bach
    Zimanyi, Esteban
    [J]. SEMANTIC WEB - ISWC 2016, PT I, 2016, 9981 : 341 - 359
  • [32] Comparison and Analysis of RDF Data Using SPARQL, HIVE, PIG in Hadoop
    Chandel, Anshul
    Garg, Deepak
    [J]. COMPUTING AND NETWORK SUSTAINABILITY, 2017, 12 : 361 - 369
  • [33] Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark
    Belov, Vladimir
    Tatarintsev, Andrey
    Nikulchev, Evgeny
    [J]. SYMMETRY-BASEL, 2021, 13 (02): : 1 - 20
  • [34] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    Ahmed, N.
    Barczak, Andre L. C.
    Susnjak, Teo
    Rashid, Mohammed A.
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [35] A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench
    N. Ahmed
    Andre L. C. Barczak
    Teo Susnjak
    Mohammed A. Rashid
    [J]. Journal of Big Data, 7
  • [36] PRACTICAL RESULTS USING APACHE HADOOP PLATFORM FOR DISTRIBUTED AND PARALLEL COMPUTING
    Toma, Cristian
    [J]. INTERNATIONAL CONFERENCE ON INFORMATICS IN ECONOMY, 2012, : 30 - 35
  • [37] A Modern Data Architecture with Apache Hadoop
    Singh, Tripty
    Darshan, V. S.
    [J]. 2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 574 - 579
  • [38] Processing LIDAR Data with Apache Hadoop
    Ruzicka, Jan
    Orcik, Lukas
    Ruzickova, Katerina
    Kisztner, Juraj
    [J]. RISE OF BIG SPATIAL DATA, 2017, : 351 - 358
  • [39] Development of a Network Intrusion Detection System Using Apache Hadoop and Spark
    Kato, Keisuke
    Klyuev, Vitaly
    [J]. 2017 IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING, 2017, : 416 - 423
  • [40] Numerical Calculations for Geophysics Inversion Problem Using Apache Hadoop Technology
    Krauzowicz, Lukasz
    Szostek, Kamil
    Dwornik, Maciej
    Oleksik, Pawel
    Piorkowski, Adam
    [J]. COMPUTER NETWORKS, 2012, 291 : 440 - 447