Accurate Sampling-Based Cardinality Estimation for Complex Graph Queries

被引:0
|
作者
Hu, Pan [1 ]
Motik, Boris [2 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
[2] Univ Oxford, Dept Comp Sci, Oxford, England
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2024年 / 49卷 / 03期
基金
英国工程与自然科学研究理事会;
关键词
Cardinality estimation; conjunctive queries; sampling; query planning; SELECTIVITY; MODELS;
D O I
10.1145/3689209
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurately estimating the cardinality (i.e., the number of answers) of complex queries plays a central role in database systems. This problem is particularly difficult in graph databases, where queries often involve a large number of joins and self-joins. Recently, Park et al. [55] surveyed seven state-of-the-art cardinality estimation approaches for graph queries. The results of their extensive empirical evaluation show that a sampling method based on the WanderJoin online aggregation algorithm [47] consistently offers superior accuracy. We extended the framework by Park et al. [55] with three additional datasets and repeated their experiments. Our results showed that WanderJoin is indeed very accurate, but it can often take a large number of samples and thus be very slow. Moreover, when queries are complex and data distributions are skewed, it often fails to find valid samples and estimates the cardinality as zero. Finally, complex graph queries often go beyond simple graph matching and involve arbitrary nesting of relational operators such as disjunction, difference, and duplicate elimination. Neither of the methods considered by Park et al. [55] is applicable to such queries. In this article, we present a novel approach for estimating the cardinality of complex graph queries. Our approach is inspired by WanderJoin, but, unlike all approaches known to us, it can process complex queries with arbitrary operator nesting. Our estimator is strongly consistent, meaning that the average of repeated estimates converges with probability one to the actual cardinality. We present optimisations of the basic algorithm that aim to reduce the chance of producing zero estimates and improve accuracy. We show empirically that our approach is both accurate and quick on complex queries and large datasets. Finally, we discuss how to integrate our approach into a simple dynamic programming query planner, and we confirm empirically that our planner produces high-quality plans that can significantly reduce end-to-end query evaluation times.
引用
收藏
页数:46
相关论文
共 50 条
  • [31] Compressive sampling-based CFO-estimation with exploited features
    Qing, Chaojin
    Wang, Jiafan
    Huang, Chuan
    Chen, Hongyuan
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2016,
  • [32] Sampling-based estimation method for parameter estimation in big data business era
    Alim, Abdul
    Shukla, Diwakar
    JOURNAL OF ADVANCES IN MANAGEMENT RESEARCH, 2021, 18 (02) : 297 - 322
  • [33] A Sampling-based Motion Planning Framework for Complex Motor Actions
    Sobti, Shlok
    Shome, Rahul
    Chaudhuri, Swarat
    Kavraki, Lydia E.
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 6928 - 6934
  • [34] Fast and Accurate Cardinality Estimation in Cellular-Based Wireless Communications
    Khoshkholgh, Mohammad G.
    Leung, Victor C. M.
    Shin, Kang G.
    2015 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2015, : 1119 - 1123
  • [35] UFOExplorer: Fast and Scalable Sampling-Based Exploration With a Graph-Based Planning Structure
    Duberg, Daniel
    Jensfelt, Patric
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02): : 2487 - 2494
  • [36] The sampling-based neighborhood graph: An approach to computing and executing feedback motion strategies
    Yang, LB
    LaValle, SM
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2004, 20 (03): : 419 - 432
  • [37] More accurate cardinality estimation in data streams
    Lu, Jie
    Chen, Hongchang
    Zhang, Zheng
    Xie, Jichao
    ELECTRONICS LETTERS, 2022, 58 (25) : 982 - 984
  • [38] Cardinality Estimation of Approximate Substring Queries using Deep Learning
    Kwon, Suyong
    Jung, Woohwan
    Shim, Kyuseok
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 3145 - 3157
  • [39] Robust Cardinality Estimation for Subgraph Isomorphism Queries on Property Graphs
    Paradies, Marcus
    Vasilyeva, Elena
    Mocan, Adrian
    Lehner, Wolfgang
    BIOMEDICAL DATA MANAGEMENT AND GRAPH ONLINE QUERYING, 2016, 9579 : 184 - 198
  • [40] HOMERUN: A Cardinality Estimation Advisor for Graph Databases
    van Leeuwen, Wilco
    Fletcher, George
    Yakovets, Nikolay
    PROCEEDINGS OF THE 7TH ACM SIGMOD JOINT INTERNATIONAL WORKSHOP ON GRAPH DATA MANAGEMENT EXPERIENCES & SYSTEMS, GRADES 2024 AND NETWORK DATA ANALYTICS, NDA 2024, GRADES-NDA 2024, 2024,