GoFast: Graph-based optimization for efficient and scalable query evaluation

被引:4
|
作者
Zouaghi, Ishaq [1 ,3 ]
Mesmoudi, Amin [2 ]
Galicia, Jorge [1 ]
Bellatreche, Ladjel [1 ]
Aguili, Taoufik [3 ]
机构
[1] LIAS ISAE ENSMA, Chasseneuil, France
[2] Univ Poitiers, LIAS, Poitiers, France
[3] LR SysCom ENIT UTM, Tunis, Tunisia
关键词
Optimization; RDF; SPARQL; Cardinality estimation; Cost model;
D O I
10.1016/j.is.2021.101738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The popularity of the Resource Description Framework (RDF) and SPARQL has thrust the development of high-performance systems to manage data represented with this model. Former approaches adapted the well-established relational model applying its storage, query processing, and optimization strategies. However, the borrowed techniques from the relational model are not universally applicable in the RDF context. First, the schema-free nature of RDF induces intensive joins overheads. Also, optimization strategies trying to find the optimal join order rely on error-prone statistics unable to capture all the correlations among triples. Graph-based approaches keep the graph structure of RDF representing the data directly as a graph. Their execution model leans on graph exploration operators to find subgraph matches to a query. Even if they have shown to outperform relational-based systems in complex queries, they are barely scalable and optimization techniques are completely system dependent. Recently, some systems such as RDF_QDAG have shown that by combining graph exploration and triples clustering one can achieve a good compromise between performance and scalability. In this paper, we propose optimization strategies for this kind of RDF management systems. First, we define novel statistics collected for clusters of triples to better capture the dependencies found in the original graph. Second, we redefine an execution plan based on these logical structures which allows to represent the RDF graph exploration process. Third, we introduce an algorithm for selecting the optimal execution plan based on a customized cost model. Finally, we propose a new approach to refine the chosen plan by pruning invalid clusters that do not participate in the construction of the final query results. All our proposals are validated experimentally using well-known RDF benchmarks. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [42] Query specific graph-based query reformulation using UMLS for clinical information access
    Sankhavara, Jainisha
    Dave, Rishi
    Dave, Bhargav
    Majumder, Prasenjit
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 108
  • [43] Scalable Query Optimization for Efficient Data Processing using MapReduce
    Shan, Yi
    Chen, Yi
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 649 - 652
  • [44] Toward scalable graph-based security analysis for cloud networks
    Sabur, Abdulhakim
    Chowdhary, Ankur
    Huang, Dijiang
    Alshamrani, Adel
    COMPUTER NETWORKS, 2022, 206
  • [45] Graph-based, scalable SoC router marries speed and flexibility
    Maliniak, D
    ELECTRONIC DESIGN, 2001, 49 (12) : 38 - 38
  • [46] ELPIS: Graph-Based Similarity Search for Scalable Data Science
    Azizi, Ilias
    Echihabi, Karima
    Palpanas, Themis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (06): : 1548 - 1559
  • [47] Efficient graph-based search for object detection
    Wei, Hui
    Yang, Chengzhuan
    Yu, Qian
    INFORMATION SCIENCES, 2017, 385 : 395 - 414
  • [48] Efficient Hierarchical Graph-Based Video Segmentation
    Grundmann, Matthias
    Kwatra, Vivek
    Han, Mei
    Essa, Irfan
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2141 - 2148
  • [49] Efficient Bayesian Methods for Graph-based Recommendation
    Lopes, Ramon
    Assuncao, Renato
    Santos, Rodrygo L. T.
    PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, : 333 - 340
  • [50] Graph-Based Optimization of Public Lighting Retrofit
    Sedziwy, Adam
    Kotulski, Leszek
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 239 - 248