GoFast: Graph-based optimization for efficient and scalable query evaluation

被引:4
|
作者
Zouaghi, Ishaq [1 ,3 ]
Mesmoudi, Amin [2 ]
Galicia, Jorge [1 ]
Bellatreche, Ladjel [1 ]
Aguili, Taoufik [3 ]
机构
[1] LIAS ISAE ENSMA, Chasseneuil, France
[2] Univ Poitiers, LIAS, Poitiers, France
[3] LR SysCom ENIT UTM, Tunis, Tunisia
关键词
Optimization; RDF; SPARQL; Cardinality estimation; Cost model;
D O I
10.1016/j.is.2021.101738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The popularity of the Resource Description Framework (RDF) and SPARQL has thrust the development of high-performance systems to manage data represented with this model. Former approaches adapted the well-established relational model applying its storage, query processing, and optimization strategies. However, the borrowed techniques from the relational model are not universally applicable in the RDF context. First, the schema-free nature of RDF induces intensive joins overheads. Also, optimization strategies trying to find the optimal join order rely on error-prone statistics unable to capture all the correlations among triples. Graph-based approaches keep the graph structure of RDF representing the data directly as a graph. Their execution model leans on graph exploration operators to find subgraph matches to a query. Even if they have shown to outperform relational-based systems in complex queries, they are barely scalable and optimization techniques are completely system dependent. Recently, some systems such as RDF_QDAG have shown that by combining graph exploration and triples clustering one can achieve a good compromise between performance and scalability. In this paper, we propose optimization strategies for this kind of RDF management systems. First, we define novel statistics collected for clusters of triples to better capture the dependencies found in the original graph. Second, we redefine an execution plan based on these logical structures which allows to represent the RDF graph exploration process. Third, we introduce an algorithm for selecting the optimal execution plan based on a customized cost model. Finally, we propose a new approach to refine the chosen plan by pruning invalid clusters that do not participate in the construction of the final query results. All our proposals are validated experimentally using well-known RDF benchmarks. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Efficient graph-based image segmentation
    Felzenszwalb, PF
    Huttenlocher, DP
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 59 (02) : 167 - 181
  • [32] AN EFFICIENT GRAPH-BASED VISUAL RERANKING
    Huang, Chong
    Dong, Yuan
    Bai, Hongliang
    Wang, Lezi
    Zhao, Nan
    Cen, Shusheng
    Zhao, Jian
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 1671 - 1675
  • [33] A Demonstration of GTI: A Scalable Graph-based Trajectory Imputation
    Isufaj, Keivin
    Choghari, Jade
    Elshrif, Mohamed M.
    31ST ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2023, 2023, : 468 - 471
  • [34] Scalable Probabilistic Matrix Factorization with Graph-Based Priors
    Strahl, Jonathan
    Peltonen, Jaakko
    Mamitsuka, Hiroshi
    Kaski, Samuel
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5851 - 5858
  • [35] Scalable Function Call Graph-based Malware Classification
    Hassen, Mehadi
    Chan, Philip K.
    PROCEEDINGS OF THE SEVENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY'17), 2017, : 239 - 248
  • [36] Efficient and Scalable Integrity Verification of Data and Query Results for Graph Databases
    Arshad, Muhammad U.
    Kundu, Ashish
    Bertino, Elisa
    Ghafoor, Arif
    Kundu, Chinmay
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (05) : 866 - 879
  • [37] A parallel query processing system based on graph-based database partitioning
    Nam, Yoon-Min
    Han, Donghyoung
    Kim, Min-Soo
    INFORMATION SCIENCES, 2019, 480 : 237 - 260
  • [38] Graph-based algorithms for the efficient solution of optimization problems involving monotone functions
    Luca Consolini
    Mattia Laurini
    Marco Locatelli
    Computational Optimization and Applications, 2019, 73 : 101 - 128
  • [39] Graph-based algorithms for the efficient solution of optimization problems involving monotone functions
    Consolini, Luca
    Laurini, Mattia
    Locatelli, Marco
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2019, 73 (01) : 101 - 128
  • [40] Approximate Query Matching for Graph-Based Holistic Image Retrieval
    Suprem, Abhijit
    Duen Horng Chau
    Pu, Calton
    BIG DATA - BIGDATA 2018, 2018, 10968 : 72 - 84