H2RDF+: High-performance Distributed Joins over Large-scale RDF Graphs

被引:0
|
作者
Papailiou, Nikolaos [1 ]
Konstantinou, Ioannis [1 ]
Tsoumakos, Dimitrios [2 ]
Karras, Panagiotis [3 ]
Koziris, Nectarios [1 ]
机构
[1] Natl Tech Univ Athens, Comp Syst Lab, GR-10682 Athens, Greece
[2] Lonian Univ, Dept Informat, Corfu, Greece
[3] Rutgers State Univ, Management Sci & Informat Syst, New Brunswick, NJ 08901 USA
关键词
RDF; SPARQL; MapReduce; HBase; Distributed Indexing; Distributed Merge-Joins;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The proliferation of data in RDF format calls for efficient and scalable solutions for their management. While scalability in the era of big data is a hard requirement, modern systems fail to adapt based on the complexity of the query. Current approaches do not scale well when faced with substantially complex, non-selective joins, resulting in exponential growth of execution times. In this work we present H2RDF+, an RDF store that efficiently performs distributed Merge and Sort-Merge joins over a multiple index scheme. H2RDF+ is highly scalable, utilizing distributed MapReduce processing and HBase indexes. Utilizing aggressive byte-level compression and result grouping over fast scans, it can process both complex and selective join queries in a highly efficient manner. Furthermore, it adaptively chooses for either single-or multi-machine execution based on join complexity estimated through index statistics. Our extensive evaluation demonstrates that H2RDF+ efficiently answers nonselective joins an order of magnitude faster than both current state-of-the-art distributed and centralized stores, while being only tenths of a second slower in simple queries, scaling linearly to the amount of available resources.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] H2RDF+ : An Efficient Data Management System for Big RDF Graphs
    Papailiou, Nikolaos
    Tsoumakos, Dimitrios
    Konstantinou, Ioannis
    Karras, Panagiotis
    Koziris, Nectarios
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 909 - 912
  • [2] Distributed Query Evaluation over Large RDF Graphs
    Peng, Peng
    [J]. WEB AND BIG DATA, APWEB-WAIM 2019, 2019, 11809 : 3 - 7
  • [3] Algebra of RDF Graphs for Querying Large-Scale Distributed Triple-Store
    Savnik, Iztok
    Nitta, Kiyoshi
    [J]. AVAILABILITY, RELIABILITY, AND SECURITY IN INFORMATION SYSTEMS, CD-ARES 2016, PAML 2016, 2016, 9817 : 3 - 18
  • [4] RDF packages: a scheme for efficient reasoning and querying over large-scale RDF data
    Ohsawa, Shohei
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2012, 8 (02) : 212 - +
  • [5] High-performance, Distributed Dictionary Encoding of RDF Datasets
    Morari, Alessandro
    Weaver, Jesse
    Villa, Oreste
    Haglin, David
    Tumeo, Antonino
    Castellana, Vito Giovanni
    Feo, John
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 250 - 253
  • [6] TraPath: Fast Regular Path Query Evaluation on Large-Scale RDF Graphs
    Wang, Xin
    Rao, Guozheng
    Jiang, Longxiang
    Lyu, Xuedong
    Yang, Yajun
    Feng, Zhiyong
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 372 - 383
  • [7] PDSM: Pregel-Based Distributed Subgraph Matching on Large Scale RDF Graphs
    Xu, Qiang
    Wang, Xin
    Xin, Yueqi
    Feng, Zhiyong
    Chen, Renhai
    [J]. COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 17 - 18
  • [8] Large-Scale Incremental OWL/RDFS Reasoning over Fuzzy RDF Data
    Jagvaral, Batselem
    Wangon, Lee
    Park, Hyun-Kyu
    Jeon, Myungjoong
    Lee, Nam-Gee
    Park, Young-Tack
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 269 - 273
  • [9] SparkRDF: In-Memory Distributed RDF Management Framework for Large-Scale Social Data
    Xu, Zhichao
    Chen, Wei
    Gai, Lei
    Wang, Tengjiao
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 337 - 349
  • [10] Grace: An Efficient Parallel SPARQL Query System over Large-Scale RDF Data
    Kang, Xiang
    Zhao, Yuying
    Yuan, Pingpeng
    Jin, Hai
    [J]. PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 769 - 774