Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries

被引:46
|
作者
Hasan, Shohedul [1 ]
Thirumuruganathan, Saravanan [2 ]
Augustine, Jees [1 ]
Koudas, Nick [3 ]
Das, Gautam [1 ]
机构
[1] UT Arlington, Arlington, TX 76019 USA
[2] HBKU, QCRI, Doha, Qatar
[3] Univ Toronto, Toronto, ON, Canada
基金
美国国家科学基金会;
关键词
selectivity estimation; cardinality estimation; deep learning; density estimation; neural autoregressive models; MADE;
D O I
10.1145/3318464.3389741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor cardinality estimates could result in the selection of bad plans by the query optimizer. Recently, deep learning has been applied to this problem with promising results. However, many of the proposed approaches often struggle to provide accurate results for multi attribute queries involving large number of predicates and with low selectivity. In this paper, we propose two complementary approaches that are effective for this scenario. Our first approach models selectivity estimation as a density estimation problem where one seeks to estimate the joint probability distribution from a finite number of samples. We leverage techniques from neural density estimation to build an accurate selectivity estimator. The key idea is to decompose the joint distribution into a set of tractable conditional probability distributions such that they satisfy the autoregressive property. Our second approach formulates selectivity estimation as a supervised deep learning problem that predicts the selectivity of a given query. We describe how to extend our algorithms for range queries. We also introduce and address a number of practical challenges arising when adapting deep learning for relational data. These include query/data featurization, incorporating query workload information in a deep learning framework and the dynamic scenario where both data and workload queries could be updated. Our extensive experiments with a special emphasis on queries with a large number of predicates and/or small result sizes demonstrates that our proposed techniques provide fast and accurate selective estimates with minimal space overhead.
引用
收藏
页码:1035 / 1050
页数:16
相关论文
共 50 条
  • [1] Multi-Attribute Queries: To Merge or Not to Merge?
    Rastegari, Mohammad
    Diba, Ali
    Parikh, Devi
    Farhadi, Ali
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3310 - 3317
  • [2] Range Queries on Multi-Attribute Trajectories
    Xu, Jianqiu
    Lu, Hua
    Gueting, Ralf Hartmut
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (06) : 1206 - 1211
  • [3] GUIDE TO CHOICE AND ESTIMATION OF MULTI-ATTRIBUTE UTILITY MODELS
    DYER, JS
    SARIN, RK
    [J]. OPERATIONS RESEARCH, 1975, 23 : B385 - B385
  • [4] LAF: A Local Depth Autoregressive Framework for Cardinality Estimation of Multi-attribute Queries
    Cheng, Qianwen
    Li, Hao
    Wang, Dawei
    Zhang, Yue
    Peng, Zhaohui
    [J]. WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 296 - 311
  • [5] Deep Learning Method for Multi-Attribute Analysis of Fingerprint Images
    Maiti, Diptadip
    Basak, Madhuchhanda
    Das, Debashis
    [J]. COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2024, 32 (02) : 199 - 222
  • [6] Multi-attribute Range Queries on Structured Overlay Networks
    Lai, Kuan-Chou
    Huang, Kuo-Chan
    Yu, You-Fu
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2011, 12 (02): : 269 - 278
  • [7] Mercury: Supporting scalable multi-attribute range queries
    Bharambe, AR
    Agrawal, M
    Seshan, S
    [J]. ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2004, 34 (04) : 353 - 366
  • [8] Continuous Range Queries over Multi-Attribute Trajectories
    Xu, Jianqiu
    Bao, Zhifeng
    Lu, Hua
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1610 - 1613
  • [9] Image Ranking and Retrieval based on Multi-Attribute Queries
    Siddiquie, Behjat
    Feris, Rogerio S.
    Davis, Larry S.
    [J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 801 - 808
  • [10] MULTI-ATTRIBUTE - MULTI-BRAND MODELS
    WOODSIDE, AG
    CLOKEY, JD
    [J]. JOURNAL OF ADVERTISING RESEARCH, 1974, 14 (05) : 33 - 40