Cardinality estimation using normalizing flow

被引:0
|
作者
Jiayi Wang
Chengliang Chai
Jiabin Liu
Guoliang Li
机构
[1] Tsinghua University,Department of Computer Science and Technology
[2] Beijing Institute of Technology,Department of Computer Science and Technology
关键词
Cardinality estimation; Query optimization; AI for DB;
D O I
暂无
中图分类号
学科分类号
摘要
Cardinality estimation is one of the most important problems in query optimization. Recently, machine learning-based techniques have been proposed to effectively estimate cardinality, which can be broadly classified into query-driven and data-driven approaches. Query-driven approaches learn a regression model from a query to its cardinality, while data-driven approaches learn a distribution of tuples, select some samples that satisfy a SQL query, and use the data distributions of these selected tuples to estimate the cardinality of the SQL query. As query-driven methods rely on training queries, the estimation quality is not reliable when there are no high-quality training queries, while data-driven methods have no such limitation and have high adaptivity. In this work, we focus on data-driven methods. A good data-driven model should achieve three optimization goals. First, the model needs to capture data dependencies between columns and support large domain sizes (achieving high accuracy). Second, the model should achieve high inference efficiency, because many data samples are needed to estimate the cardinality (achieving low inference latency). Third, the model should not be too large (achieving a small model size). However, existing data-driven methods cannot simultaneously optimize the three goals. To address the limitations, we propose a novel cardinality estimator FACE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\texttt{FACE}$$\end{document}, which leverages the normalizing flow-based model to learn a continuous joint distribution for relational data. FACE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\texttt{FACE}$$\end{document} can transform a complex distribution over continuous random variables into a simple distribution (e.g., multivariate normal distribution) and use the probability density to estimate the cardinality for both sequential queries and parallel queries. First, we design a dequantization method to make data more “continuous.” Second, we propose encoding and indexing techniques to handle Like predicates for string data. Third, we propose a Monte Carlo method to estimate the cardinality based on the FACE\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\texttt{FACE}$$\end{document} model. Fourth, we propose a grouping technique to process parallel queries. Fifth, we discuss how to support join queries. Experimental results show that our method significantly outperforms existing approaches in terms of estimation accuracy while keeping similar latency and model size.
引用
收藏
页码:323 / 348
页数:25
相关论文
共 50 条
  • [1] Cardinality estimation using normalizing flow
    Wang, Jiayi
    Chai, Chengliang
    Liu, Jiabin
    Li, Guoliang
    [J]. VLDB JOURNAL, 2024, 33 (02): : 323 - 348
  • [2] FACE: A Normalizing Flow based Cardinality Estimator
    Wang, Jiayi
    Chai, Chengliang
    Liu, Jiabin
    Li, Guoliang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 15 (01): : 72 - 84
  • [3] Normalizing Cardinality Rules Using Merging and Sorting Constructions
    Bomanson, Jori
    Janhunen, Tomi
    [J]. LOGIC PROGRAMMING AND NONMONOTONIC REASONING (LPNMR 2013), 2013, 8148 : 187 - 199
  • [4] Networks cardinality estimation using order statistics
    Lucchese, Riccardo
    Varagnolo, Damiano
    [J]. 2015 AMERICAN CONTROL CONFERENCE (ACC), 2015, : 3810 - 3817
  • [5] Understanding Cardinality Estimation Using Entropy Maximization
    Re, Christopher
    Suciu, Dan
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2012, 37 (01):
  • [6] Understanding Cardinality Estimation using Entropy Maximization
    Re, Christopher
    Suciu, Dan
    [J]. PODS 2010: PROCEEDINGS OF THE TWENTY-NINTH ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2010, : 53 - 64
  • [7] Anomaly Detection Using Normalizing Flow-Based Density Estimation and Synthetic Defect Classification
    Oh, Seungmi
    Kim, Jeongtae
    [J]. IEEE ACCESS, 2024, 12 : 75873 - 75887
  • [8] Normalizing flow based uncertainty estimation for deep regression analysis
    Zhang, Baobing
    Sui, Wanxin
    Huang, Zhengwen
    Li, Maozhen
    Qi, Man
    [J]. NEUROCOMPUTING, 2024, 585
  • [9] Per-Flow Cardinality Estimation Based On Virtual LogLog Sketching
    Zhou, Zeyu
    Hajek, Bruce
    [J]. 2019 53RD ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2019,
  • [10] Cardinality Estimation of Approximate Substring Queries using Deep Learning
    Kwon, Suyong
    Jung, Woohwan
    Shim, Kyuseok
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 3145 - 3157