Fast execution of RDF queries using Apache Hadoop

被引:0
|
作者
Mazumdar, Somnath [1 ]
Scionti, Alberto [2 ]
机构
[1] Univ Siena, Dept Informat Engn & Math, Siena, Italy
[2] Ist Super Mario Boella ISMB, Turin, Italy
来源
关键词
SPARQL; ENGINE;
D O I
10.1016/bs.adcom.2020.03.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Map-Reduce (MR) is a distributed programming framework which became very popular since its introduction, due to its ability to process massive data sets. MR provides a robust and straightforward mechanism to implement distributed applications without worrying much about manymanagement aspects of parallel programming (e.g., instantiating jobs, data distribution, job synchronization). On the other hand, the Resource Description Framework (RDF) with its simplicity and flexibility, can represent semistructured and unstructured data which are very important for representing web-semantics. SPARQL is a query language aimed at retrieving and manipulating data stored in RDF format and also supports "Big Data" applications. In this book chapter, we present a framework designed to evaluate complex SPARQL queries fast. To improve the execution of SPARQL queries, we implemented the query engine on the Hadoop framework. The engine can handle large and complex queries involving multiple join variables while running on large RDF data sets. Further execution speedup is obtained by preprocessing the input datawith parallel Bloomfilters. The query engine has been tested on the SP2 benchmark, and the results demonstrate the benefits of the design. In this case, the minimum query improvement is 5% while the maximum improvement has been achieved is 82%.
引用
收藏
页码:1 / 33
页数:33
相关论文
共 50 条
  • [41] Web queries in Protoform and RDF semantic
    Tseng, C
    Ng, P
    [J]. Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1437 - 1440
  • [42] Dynamic Modification of Continuous Queries by Using RDF Metadata of Information Sources
    Watanabe, Yousuke
    Yokota, Haruo
    [J]. 2015 10TH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2015, : 754 - 759
  • [43] Evaluation of RDF queries via equivalence
    Ni, Weiwei
    Chong, Zhihong
    Shu, Hu
    Bao, Jiajia
    Zhou, Aoying
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (01) : 20 - 33
  • [44] Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark
    Azhir, Elham
    Hosseinzadeh, Mehdi
    Khan, Faheem
    Mosavi, Amir
    [J]. MATHEMATICS, 2022, 10 (19)
  • [45] Query Relaxation for Star Queries on RDF
    Huang, Hai
    Liu, Chengfei
    [J]. WEB INFORMATION SYSTEM ENGINEERING-WISE 2010, 2010, 6488 : 376 - 389
  • [46] Relational Processing of RDF Queries: A Survey
    Sakr, Sherif
    Al-Naymat, Ghazi
    [J]. SIGMOD RECORD, 2009, 38 (04) : 23 - 28
  • [47] On the Usability of Hadoop MapReduce, Apache Spark & Apache Flink for Data Science
    Akil, Bilal
    Zhou, Ying
    Roehm, Uwe
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 303 - 310
  • [48] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
    Benbernou, Salima
    Ouziri, Mourad
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
  • [49] Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark
    Hassan, Mahmudul
    Bansal, Srividya K.
    [J]. 2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 24 - 31
  • [50] Query Execution for RDF Data using Structure Indexed Vertical Partitioning
    Shah, Bhavik
    Padiya, Trupti
    Bhise, Minal
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 575 - 584