Automating localized learning for cardinality estimation based on XGBoost

被引:0
|
作者
Feng, Jieming [1 ,2 ]
Li, Zhanhuai [1 ,2 ]
Chen, Qun [1 ,2 ]
Liu, Hailong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Key Lab Big Data Storage & Management, Minist Ind & Informat Technol, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-driving DBMS; AI4DB; ML for cardinality estimation; Local models; Automation; SELECTIVITY ESTIMATION;
D O I
10.1007/s10115-024-02142-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For cardinality estimation in DBMS, building multiple local models instead of one global model can usually improve estimation accuracy as well as reducing the effort to label large amounts of training data. Unfortunately, the existing approach of localized learning requires users to explicitly specify which query patterns a local model can handle. Making these decisions is very arduous and error-prone for users; to make things worse, it limits the usability of local models. In this paper, we propose a localized learning solution for cardinality estimation based on XGBoost, which can automatically build an optimal combination of local models given a query workload. It consists of two phases: 1) model initialization; 2) model evolution. In the first phase, it clusters training data into a set of coarse-grained query pattern groups based on pattern similarity and constructs a separate local model for each group. In the second phase, it iteratively merges and splits clusters to identify an optimal combination by reconstructing local models. We formulate the problem of identifying the optimal combination of local models as a combinatorial optimization problem and present an efficient heuristic algorithm, named MMS (Models Merging and Splitting), for its solution due to its exponential complexity. Finally, we validate its performance superiority over the existing learning alternatives by extensive experiments on real datasets.
引用
下载
收藏
页码:3825 / 3854
页数:30
相关论文
共 50 条
  • [1] Survey of cardinality estimation techniques based on machine learning
    Yue W.
    Qu W.
    Lin K.
    Wang X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (02): : 413 - 427
  • [2] Efficient Cardinality and Cost Estimation with Bidirectional Compressor -based Ensemble Learning
    Liang, Zibo
    Chen, Xu
    Zhao, Yan
    Xie, Jiandong
    Zeng, Kai
    Zheng, Kai
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 388 - 397
  • [3] Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs
    Chen, Jeremy
    Huang, Yuqing
    Wang, Mushi
    Salihoglu, Semih
    Salem, Kenneth
    SIGMOD RECORD, 2023, 52 (01) : 94 - 102
  • [4] Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs
    Chen, Jeremy
    Huang, Yuqing
    Wang, Mushi
    Salihoglu, Semih
    Salem, Ken
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (08): : 1533 - 1545
  • [5] Cardinality Estimation Based on Cluster Analysis
    Zeng, Xiaoning
    Lin, Xudong
    Pei, Caiyan
    Cao, Jing
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2019, 35 (01) : 201 - 222
  • [6] Investment estimation of prefabricated concrete buildings based on XGBoost machine learning algorithm
    Yan, Hongyan
    He, Zheng
    Gao, Ce
    Xie, Mingjing
    Sheng, Haoyu
    Chen, Huihua
    ADVANCED ENGINEERING INFORMATICS, 2022, 54
  • [7] PostCENN: PostgreSQL with Machine Learning Models for Cardinality Estimation
    Woltmann, Lucas
    Olwig, Dominik
    Hartmann, Claudio
    Habich, Dirk
    Lehner, Wolfgang
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2715 - 2718
  • [9] Contention-Based Estimation of Neighbor Cardinality
    Adam, Helmut
    Yanmaz, Evsen
    Bettstetter, Christian
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2013, 12 (03) : 542 - 555
  • [10] Kernel-Based Skyline Cardinality Estimation
    Zhang, Zhenjie
    Yang, Yin
    Cai, Ruichu
    Papadias, Dimitris
    Tung, Anthony
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 509 - 521