Fast execution of RDF queries using Apache Hadoop

被引:0
|
作者
Mazumdar, Somnath [1 ]
Scionti, Alberto [2 ]
机构
[1] Univ Siena, Dept Informat Engn & Math, Siena, Italy
[2] Ist Super Mario Boella ISMB, Turin, Italy
来源
关键词
SPARQL; ENGINE;
D O I
10.1016/bs.adcom.2020.03.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Map-Reduce (MR) is a distributed programming framework which became very popular since its introduction, due to its ability to process massive data sets. MR provides a robust and straightforward mechanism to implement distributed applications without worrying much about manymanagement aspects of parallel programming (e.g., instantiating jobs, data distribution, job synchronization). On the other hand, the Resource Description Framework (RDF) with its simplicity and flexibility, can represent semistructured and unstructured data which are very important for representing web-semantics. SPARQL is a query language aimed at retrieving and manipulating data stored in RDF format and also supports "Big Data" applications. In this book chapter, we present a framework designed to evaluate complex SPARQL queries fast. To improve the execution of SPARQL queries, we implemented the query engine on the Hadoop framework. The engine can handle large and complex queries involving multiple join variables while running on large RDF data sets. Further execution speedup is obtained by preprocessing the input datawith parallel Bloomfilters. The query engine has been tested on the SP2 benchmark, and the results demonstrate the benefits of the design. In this case, the minimum query improvement is 5% while the maximum improvement has been achieved is 82%.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 50 条
  • [1] Execution of Recursive Queries in Apache Spark
    Katsogridakis, Pavlos
    Papagiannaki, Sofia
    Pratikakis, Polyvios
    [J]. EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 : 289 - 302
  • [2] Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
    Garg, Varun
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 938 - 941
  • [3] Processing RDF Using Hadoop
    Ali, Mehreen
    Bharat, K. Sriram
    Ranichandra, C.
    [J]. ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 2, 2013, 177 : 385 - 394
  • [4] Comparative Analysis of Apache Spark and Hadoop MapReduce Using Various Parameters and Execution Time
    Meena, Bhagavathula
    Sarwani, I. S. L.
    Archana, M.
    Supriya, P.
    [J]. INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 719 - 725
  • [5] Native Execution of GraphQL Queries over RDF Graphs Using Multi-Way Joins
    Karalis, Nikolaos
    Bigerl, Alexander
    Ngonga Ngomo, Axel-Cyrille
    [J]. KNOWLEDGE GRAPHS: SEMANTICS, MACHINE LEARNING, AND LANGUAGES, 2023, 56 : 77 - 93
  • [6] RIQ: Fast processing of SPARQL queries on RDF quadruples
    Katib, Anas
    Slavov, Vasil
    Rao, Praveen
    [J]. JOURNAL OF WEB SEMANTICS, 2016, 37-38 : 90 - 111
  • [7] Fast Processing SPARQL Queries on Large RDF Data
    Yang, Guang
    Yuan, Pingpeng
    Jin, Hai
    [J]. 2016 IEEE 14TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 14TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 2ND INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/DATACOM/CYBERSC, 2016, : 921 - 926
  • [8] Fast and Concurrent RDF Queries using RDMA-assisted GPU Graph Exploration
    Wang, Siyuan
    Lou, Chang
    Chen, Rong
    Chen, Haibo
    [J]. PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, 2018, : 651 - 664
  • [9] AN APPROACH FOR FAST AND PARALLEL VIDEO PROCESSING ON APACHE HADOOP CLUSTERS
    Tan, Hanlin
    Chen, Lidong
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
  • [10] Analysis of Apache Logs Using Hadoop and Hive
    Velinov, Aleksandar
    Zdravev, Zoran
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2018, 7 (03): : 645 - 650