Accurate Sampling-Based Cardinality Estimation for Complex Graph Queries

被引:0
|
作者
Hu, Pan [1 ]
Motik, Boris [2 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
[2] Univ Oxford, Dept Comp Sci, Oxford, England
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2024年 / 49卷 / 03期
基金
英国工程与自然科学研究理事会;
关键词
Cardinality estimation; conjunctive queries; sampling; query planning; SELECTIVITY; MODELS;
D O I
10.1145/3689209
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurately estimating the cardinality (i.e., the number of answers) of complex queries plays a central role in database systems. This problem is particularly difficult in graph databases, where queries often involve a large number of joins and self-joins. Recently, Park et al. [55] surveyed seven state-of-the-art cardinality estimation approaches for graph queries. The results of their extensive empirical evaluation show that a sampling method based on the WanderJoin online aggregation algorithm [47] consistently offers superior accuracy. We extended the framework by Park et al. [55] with three additional datasets and repeated their experiments. Our results showed that WanderJoin is indeed very accurate, but it can often take a large number of samples and thus be very slow. Moreover, when queries are complex and data distributions are skewed, it often fails to find valid samples and estimates the cardinality as zero. Finally, complex graph queries often go beyond simple graph matching and involve arbitrary nesting of relational operators such as disjunction, difference, and duplicate elimination. Neither of the methods considered by Park et al. [55] is applicable to such queries. In this article, we present a novel approach for estimating the cardinality of complex graph queries. Our approach is inspired by WanderJoin, but, unlike all approaches known to us, it can process complex queries with arbitrary operator nesting. Our estimator is strongly consistent, meaning that the average of repeated estimates converges with probability one to the actual cardinality. We present optimisations of the basic algorithm that aim to reduce the chance of producing zero estimates and improve accuracy. We show empirically that our approach is both accurate and quick on complex queries and large datasets. Finally, we discuss how to integrate our approach into a simple dynamic programming query planner, and we confirm empirically that our planner produces high-quality plans that can significantly reduce end-to-end query evaluation times.
引用
收藏
页数:46
相关论文
共 50 条
  • [1] Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries
    Qiu, Yuan
    Wang, Yilei
    Yi, Ke
    Li, Feifei
    Wu, Bin
    Zhan, Chaoqun
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1465 - 1477
  • [2] Cardinality estimation for property graph queries with gated learning approach on the graph database
    He Z.
    Yu J.
    Du X.
    Guo B.
    Li Z.
    Li Z.
    Multimedia Tools and Applications, 2025, 84 (11) : 9159 - 9183
  • [3] Sampling-based lower bounds for counting queries
    Gogate, Vibhav
    Dechter, Rina
    INTELLIGENZA ARTIFICIALE, 2011, 5 (02) : 171 - 188
  • [4] Sampling-based estimators for subset-based queries
    Shantanu Joshi
    Christopher Jermaine
    The VLDB Journal, 2009, 18 : 181 - 202
  • [5] Sampling-based estimators for subset-based queries
    Joshi, Shantanu
    Jermaine, Christopher
    VLDB JOURNAL, 2009, 18 (01): : 181 - 202
  • [6] Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins
    Neumann, Thomas
    Moerkotte, Guido
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 984 - 994
  • [7] GRAPH REDUCTIONS TO SPEED UP IMPORTANCE SAMPLING-BASED STATIC RELIABILITY ESTIMATION
    L'Ecuyer, Pierre
    Saggadi, Samira
    Tuffin, Bruno
    PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 429 - 438
  • [8] An accurate sampling-based method for approximating geometry
    Chen, Yong
    COMPUTER-AIDED DESIGN, 2007, 39 (11) : 975 - 986
  • [9] A Sampling-Based Tool for Scaling Graph Datasets
    Musaafir, Ahmed
    Uta, Alexandru
    Dreuning, Henk
    Varbanescu, Ana-Lucia
    PROCEEDINGS OF THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE'20), 2020, : 289 - 300
  • [10] A Sampling-Based Approach to Accelerating Queries in Log Management Systems
    Wagner, Tal
    Schkufza, Eric
    Wieder, Udi
    COMPANION PROCEEDINGS OF THE 2016 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES AND APPLICATIONS: SOFTWARE FOR HUMANITY (SPLASH COMPANION'16), 2016, : 37 - 38