Accurate Sampling-Based Cardinality Estimation for Complex Graph Queries

被引:0
|
作者
Hu, Pan [1 ]
Motik, Boris [2 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
[2] Univ Oxford, Dept Comp Sci, Oxford, England
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2024年 / 49卷 / 03期
基金
英国工程与自然科学研究理事会;
关键词
Cardinality estimation; conjunctive queries; sampling; query planning; SELECTIVITY; MODELS;
D O I
10.1145/3689209
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Accurately estimating the cardinality (i.e., the number of answers) of complex queries plays a central role in database systems. This problem is particularly difficult in graph databases, where queries often involve a large number of joins and self-joins. Recently, Park et al. [55] surveyed seven state-of-the-art cardinality estimation approaches for graph queries. The results of their extensive empirical evaluation show that a sampling method based on the WanderJoin online aggregation algorithm [47] consistently offers superior accuracy. We extended the framework by Park et al. [55] with three additional datasets and repeated their experiments. Our results showed that WanderJoin is indeed very accurate, but it can often take a large number of samples and thus be very slow. Moreover, when queries are complex and data distributions are skewed, it often fails to find valid samples and estimates the cardinality as zero. Finally, complex graph queries often go beyond simple graph matching and involve arbitrary nesting of relational operators such as disjunction, difference, and duplicate elimination. Neither of the methods considered by Park et al. [55] is applicable to such queries. In this article, we present a novel approach for estimating the cardinality of complex graph queries. Our approach is inspired by WanderJoin, but, unlike all approaches known to us, it can process complex queries with arbitrary operator nesting. Our estimator is strongly consistent, meaning that the average of repeated estimates converges with probability one to the actual cardinality. We present optimisations of the basic algorithm that aim to reduce the chance of producing zero estimates and improve accuracy. We show empirically that our approach is both accurate and quick on complex queries and large datasets. Finally, we discuss how to integrate our approach into a simple dynamic programming query planner, and we confirm empirically that our planner produces high-quality plans that can significantly reduce end-to-end query evaluation times.
引用
收藏
页数:46
相关论文
共 50 条
  • [41] A sampling approach for skyline query cardinality estimation
    Luo, Cheng
    Jiang, Zhewei
    Hou, Wen-Chi
    He, Shan
    Zhu, Qiang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (02) : 281 - 301
  • [42] A Compressive Sampling-Based Channel Estimation Method for Network Visibility Instrumentation
    De Vito, Luca
    Picariello, Francesco
    Rapuano, Sergio
    Tudosa, Ioan
    Barford, Lee
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2020, 69 (05) : 2335 - 2344
  • [43] A sampling approach for skyline query cardinality estimation
    Cheng Luo
    Zhewei Jiang
    Wen-Chi Hou
    Shan He
    Qiang Zhu
    Knowledge and Information Systems, 2012, 32 : 281 - 301
  • [44] Visual analytics system for LOD using sampling-based structure estimation
    Takama Y.
    Yabe A.
    Ishikawa H.
    Transactions of the Japanese Society for Artificial Intelligence, 2017, 32 (01) : WII - B_1
  • [45] Importance sampling-based estimation over AND/OR search spaces for graphical models
    Gogate, Vibhav
    Dechter, Rina
    ARTIFICIAL INTELLIGENCE, 2012, 184 : 38 - 77
  • [46] Entropy Estimation for ADC Sampling-Based True Random Number Generators
    Ma, Yuan
    Chen, Tianyu
    Lin, Jingqiang
    Yang, Jing
    Jing, Jiwu
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2019, 14 (11) : 2887 - 2900
  • [47] Importance Sampling-Based Maximum Likelihood Estimation for Multidimensional Harmonic Retrieval
    Fang, Wen-Hsien
    Lee, Yi-Chiao
    Chen, Yie-Tarng
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (01) : 35 - 39
  • [48] A compressive sampling-based method for classification and parameter estimation of FSK signals
    De Vito, Luca
    Dobre, Octavia A.
    MEASUREMENT, 2017, 98 : 439 - 444
  • [49] DOA Estimation Using Compressive Sampling-Based Sensors in the Presence of Interference
    Salari, Soheil
    Chan, Francois
    Chan, Yiu-Tong
    Guay, Rudy
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2020, 56 (06) : 4395 - 4405
  • [50] Sampling-based estimation for massive survival data with additive hazards model
    Zuo, Lulu
    Zhang, Haixiang
    Wang, HaiYing
    Liu, Lei
    STATISTICS IN MEDICINE, 2021, 40 (02) : 441 - 450