Cardinality Estimation of Approximate Substring Queries using Deep Learning

被引:5
|
作者
Kwon, Suyong [1 ]
Jung, Woohwan [2 ]
Shim, Kyuseok [1 ]
机构
[1] Seoul Natl Univ, Elect & Comp Engn, Seoul, South Korea
[2] Hanyang Univ, Comp Sci & Engn, Seoul, South Korea
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 15卷 / 11期
基金
新加坡国家研究基金会;
关键词
SELECTIVITY ESTIMATION;
D O I
10.14778/3551793.3551859
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cardinality estimation of an approximate substring query is an important problem in database systems. Traditional approaches build a summary from the text data and estimate the cardinality using the summary with some statistical assumptions. Since deep learning models can learn underlying complex data patterns effectively, they have been successfully applied and shown to outperform traditional methods for cardinality estimations of queries in database systems. However, since they are not yet applied to approximate substring queries, we investigate a deep learning approach for cardinality estimation of such queries. Although the accuracy of deep learning models tends to improve as the train data size increases, producing a large train data is computationally expensive for cardinality estimation of approximate substring queries. Thus, we develop efficient train data generation algorithms by avoiding unnecessary computations and sharing common computations. We also propose a deep learning model as well as a novel learning method to quickly obtain an accurate deep learning-based estimator. Extensive experiments confirm the superiority of our data generation algorithms and deep learning model with the novel learning method.
引用
下载
收藏
页码:3145 / 3157
页数:13
相关论文
共 50 条
  • [1] Cardinality estimation of activity trajectory similarity queries using deep learning
    Tian, Ruijie
    Zhang, Weishi
    Wang, Fei
    Zhou, Jingchun
    Alhudhaif, Adi
    Alenezi, Fayadh
    INFORMATION SCIENCES, 2023, 646
  • [2] A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation
    Wu, Peizhi
    Cong, Gao
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2009 - 2022
  • [3] QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries
    Kittelmann, Florian
    Sulimov, Pavel
    Stockinger, Kurt
    PROCEEDINGS OF THE 1ST WORKSHOP ON QUANTUM COMPUTING AND QUANTUM-INSPIRED TECHNOLOGY FOR DATA-INTENSIVE SYSTEMS AND APPLICATIONS, Q-DATA, CO-LOCATED WITH ACM INTERNATIONAL CONFERENCE ON DATA MANAGEMENT, SIGMOD, 2024, : 2 - 13
  • [4] Sample-Efficient Cardinality Estimation Using Geometric Deep Learning
    Reiner, Silvan
    Grossniklaus, Michael
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (04): : 740 - 752
  • [5] Learned Cardinality Estimation for Similarity Queries
    Sun, Ji
    Li, Guoliang
    Tang, Nan
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1745 - 1757
  • [6] Cardinality estimation for the optimization of queries on ontologies
    Shironoshita, E. Patrick
    Ryan, Michael T.
    Kabuka, Mansur R.
    SIGMOD RECORD, 2007, 36 (02) : 13 - 18
  • [7] Semantic cardinality estimation for queries objects
    Nodine, MH
    Cherniack, M
    Nodine, MH
    PERSISTENT OBJECT SYSTEMS: PRINCIPLES AND PRACTICE, 1997, : 164 - 173
  • [8] An In-place Framework for Exact and Approximate Shortest Unique Substring Queries
    Hon, Wing-Kai
    Thankachan, Sharma V.
    Xu, Bojian
    ALGORITHMS AND COMPUTATION, ISAAC 2015, 2015, 9472 : 755 - 767
  • [9] Cardinality estimation for property graph queries with gated learning approach on the graph database
    Zhenzhen He
    Jiong Yu
    Xusheng Du
    Binglei Guo
    Ziyang Li
    Zhe Li
    Multimedia Tools and Applications, 2025, 84 (11) : 9159 - 9183
  • [10] Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach
    Wang, Yaoshu
    Xiao, Chuan
    Qin, Jianbin
    Cao, Xin
    Sun, Yifang
    Wang, Wei
    Onizuka, Makoto
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1197 - 1212