Automating localized learning for cardinality estimation based on XGBoost

被引:0
|
作者
Feng, Jieming [1 ,2 ]
Li, Zhanhuai [1 ,2 ]
Chen, Qun [1 ,2 ]
Liu, Hailong [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Key Lab Big Data Storage & Management, Minist Ind & Informat Technol, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-driving DBMS; AI4DB; ML for cardinality estimation; Local models; Automation; SELECTIVITY ESTIMATION;
D O I
10.1007/s10115-024-02142-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For cardinality estimation in DBMS, building multiple local models instead of one global model can usually improve estimation accuracy as well as reducing the effort to label large amounts of training data. Unfortunately, the existing approach of localized learning requires users to explicitly specify which query patterns a local model can handle. Making these decisions is very arduous and error-prone for users; to make things worse, it limits the usability of local models. In this paper, we propose a localized learning solution for cardinality estimation based on XGBoost, which can automatically build an optimal combination of local models given a query workload. It consists of two phases: 1) model initialization; 2) model evolution. In the first phase, it clusters training data into a set of coarse-grained query pattern groups based on pattern similarity and constructs a separate local model for each group. In the second phase, it iteratively merges and splits clusters to identify an optimal combination by reconstructing local models. We formulate the problem of identifying the optimal combination of local models as a combinatorial optimization problem and present an efficient heuristic algorithm, named MMS (Models Merging and Splitting), for its solution due to its exponential complexity. Finally, we validate its performance superiority over the existing learning alternatives by extensive experiments on real datasets.
引用
下载
收藏
页码:3825 / 3854
页数:30
相关论文
共 50 条
  • [41] Cocrystal virtual screening based on the XGBoost machine learning model
    Dezhi Yang
    Li Wang
    Penghui Yuan
    Qi An
    Bin Su
    Mingchao Yu
    Ting Chen
    Kun Hu
    Li Zhang
    Yang Lu
    Guanhua Du
    Chinese Chemical Letters, 2023, 34 (08) : 424 - 429
  • [42] Cocrystal virtual screening based on the XGBoost machine learning model
    Yang, Dezhi
    Wang, Li
    Yuan, Penghui
    An, Qi
    Su, Bin
    Yu, Mingchao
    Chen, Ting
    Hu, Kun
    Zhang, Li
    Lu, Yang
    Du, Guanhua
    CHINESE CHEMICAL LETTERS, 2023, 34 (08)
  • [43] Cardinality Estimation in Inner Product Space
    Hirata, Kohei
    Amagata, Daichi
    Hara, Takahiro
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2022, 3 : 208 - 216
  • [44] RFID Cardinality Estimation with Blocker Tags
    Liu, Xiulong
    Xiao, Bin
    Li, Keqiu
    Wu, Jie
    Liu, Alex X.
    Qi, Heng
    Xie, Xin
    2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), 2015,
  • [45] Learned Cardinality Estimation for Similarity Queries
    Sun, Ji
    Li, Guoliang
    Tang, Nan
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1745 - 1757
  • [46] CARDINALITY ESTIMATION IN NUMERIC ONLINE DATABASES
    JARVELIN, K
    INFORMATION PROCESSING & MANAGEMENT, 1986, 22 (06) : 523 - 548
  • [47] Distributed Cardinality Estimation in Anonymous Networks
    Varagnolo, Damiano
    Pillonetto, Gianluigi
    Schenato, Luca
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (03) : 645 - 659
  • [48] Are We Ready For Learned Cardinality Estimation?
    Wang, Xiaoying
    Qu, Changbo
    Wu, Weiyuan
    Wang, Jiannan
    Zhou, Qingqing
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (09): : 1640 - 1654
  • [49] Cardinality estimation using normalizing flow
    Wang, Jiayi
    Chai, Chengliang
    Liu, Jiabin
    Li, Guoliang
    VLDB JOURNAL, 2024, 33 (02): : 323 - 348
  • [50] Accelerating the HyperLogLog Cardinality Estimation Algorithm
    Bozkus, Cem
    Fraguela, Basilio B.
    SCIENTIFIC PROGRAMMING, 2017, 2017