Automating localized learning for cardinality estimation based on XGBoost

被引:0
|
作者
Feng, Jieming [1 ,2 ]
Li, Zhanhuai [1 ,2 ]
Chen, Qun [1 ,2 ]
Liu, Hailong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Key Lab Big Data Storage & Management, Minist Ind & Informat Technol, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-driving DBMS; AI4DB; ML for cardinality estimation; Local models; Automation; SELECTIVITY ESTIMATION;
D O I
10.1007/s10115-024-02142-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For cardinality estimation in DBMS, building multiple local models instead of one global model can usually improve estimation accuracy as well as reducing the effort to label large amounts of training data. Unfortunately, the existing approach of localized learning requires users to explicitly specify which query patterns a local model can handle. Making these decisions is very arduous and error-prone for users; to make things worse, it limits the usability of local models. In this paper, we propose a localized learning solution for cardinality estimation based on XGBoost, which can automatically build an optimal combination of local models given a query workload. It consists of two phases: 1) model initialization; 2) model evolution. In the first phase, it clusters training data into a set of coarse-grained query pattern groups based on pattern similarity and constructs a separate local model for each group. In the second phase, it iteratively merges and splits clusters to identify an optimal combination by reconstructing local models. We formulate the problem of identifying the optimal combination of local models as a combinatorial optimization problem and present an efficient heuristic algorithm, named MMS (Models Merging and Splitting), for its solution due to its exponential complexity. Finally, we validate its performance superiority over the existing learning alternatives by extensive experiments on real datasets.
引用
下载
收藏
页码:3825 / 3854
页数:30
相关论文
共 50 条
  • [11] Cardinality estimation based on logical possible worlds
    Lin, Xudong
    Zeng, Xiaoning
    ICIC Express Letters, 2014, 8 (11): : 3215 - 3220
  • [12] Algebra-based XQuery cardinality estimation
    Sakr, Sherif
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2008, 4 (01) : 6 - +
  • [13] Cardinality Estimation of Approximate Substring Queries using Deep Learning
    Kwon, Suyong
    Jung, Woohwan
    Shim, Kyuseok
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 3145 - 3157
  • [14] Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach
    Wang, Yaoshu
    Xiao, Chuan
    Qin, Jianbin
    Cao, Xin
    Sun, Yifang
    Wang, Wei
    Onizuka, Makoto
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1197 - 1212
  • [15] A Cardinality Estimation Approach Based on Two Level Histograms
    Lin, Xudong
    Zeng, Xiaoning
    Pu, Xiaowei
    Sun, Yanyan
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2015, 31 (05) : 1733 - 1756
  • [16] Lightweight Cardinality Estimation in LSM-based Systems
    Absalyamov, Ildar
    Carey, Michael J.
    Tsotras, Vassilis J.
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 841 - 855
  • [17] RFID Cardinality Estimation Approach Based on Multilayer Perceptron
    Xie X.
    Liu X.-L.
    Wang J.-X.
    Guo S.
    Li K.-Q.
    Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (03): : 499 - 511
  • [18] A cardinality estimation approach based on two level histograms
    Department of Information Engineering, Environmental Management College of China, Qinhuangdao, Hebei
    066004, China
    不详
    066004, China
    J. Inf. Sci. Eng., 5 (1733-1756):
  • [19] Learning complex predicates for cardinality estimation using recursive neural networks
    Wang, Zhi
    Duan, Hancong
    Cheng, Yamin
    Min, Geyong
    INFORMATION SYSTEMS, 2024, 124
  • [20] QardEst: Using Quantum Machine Learning for Cardinality Estimation of Join Queries
    Kittelmann, Florian
    Sulimov, Pavel
    Stockinger, Kurt
    PROCEEDINGS OF THE 1ST WORKSHOP ON QUANTUM COMPUTING AND QUANTUM-INSPIRED TECHNOLOGY FOR DATA-INTENSIVE SYSTEMS AND APPLICATIONS, Q-DATA, CO-LOCATED WITH ACM INTERNATIONAL CONFERENCE ON DATA MANAGEMENT, SIGMOD, 2024, : 2 - 13