Confidence bounds for sampling-based GROUP BY estimates

被引:5
|
作者
Xu, Fei [1 ]
Jermaine, Christopher [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 03期
关键词
algorithms; theory; reliability; approximate query processing; multiple hypothesis testing; sampling;
D O I
10.1145/1386118.1386122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sampling is now a very important data management tool, to such an extent that an interface for database sampling is included in the latest SQL standard. In this article we reconsider in depth what at first may seem like a very simple problem-computing the error of a sampling-based guess for the answer to a GROUP BY query over a multitable join. The difficulty when sampling for the answer to such a query is that the same sample will be used to guess the result of the query for each group, which induces correlations among the estimates. Thus, from a statistical point-of-view it is very problematic and even dangerous to use traditional methods such as confidence intervals for communicating estimate accuracy to the user. We explore ways to address this problem, and pay particular attention to the computational aspects of computing "safe" confidence intervals.
引用
收藏
页数:44
相关论文
共 50 条
  • [41] A Sampling-Based Method for Tensor Ring Decomposition
    Malik, Osman Asif
    Becker, Stephen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [42] Bayesian individualization via sampling-based methods
    Wakefield, J
    JOURNAL OF PHARMACOKINETICS AND BIOPHARMACEUTICS, 1996, 24 (01): : 103 - 131
  • [43] SamBaS: Sampling-Based Stochastic Block Partitioning
    Wanye, Frank
    Gleyzer, Vitaliy
    Kao, Edward
    Feng, Wu-chun
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (03): : 3053 - 3065
  • [44] A Negative Sampling-Based Service Recommendation Method
    Xie, Ziming
    Cao, Buqing
    Liyan, Xinwen
    Tang, Bing
    Qing, Yueying
    Xie, Xiang
    Wang, Siyuan
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT I, 2022, 460 : 3 - 19
  • [45] A New Sampling-based SVM for Face Recognition
    Jiang, Wenhan
    Zhou, Xiaofei
    Hou, Hongchuan
    Lin, Xinggang
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 854 - +
  • [46] A Sampling-Based Approach to Probabilistic Pursuit Evasion
    Mahadevan, Aditya
    Amato, Nancy M.
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 3192 - 3199
  • [47] Sampling-Based Query Re-Optimization
    Wu, Wentao
    Naughton, Jeffrey F.
    Singh, Harneet
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1721 - 1736
  • [48] A Sampling-Based Approach for Discovering Subspace Clusters
    Moens, Sandy
    Cule, Boris
    Goethals, Bart
    DISCOVERY SCIENCE (DS 2019), 2019, 11828 : 61 - 71
  • [49] Sampling-based Sparse Format Selection on GPUs
    Zhu, Gangyi
    Agrawal, Gagan
    2021 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2021), 2021, : 198 - 208
  • [50] Sampling-Based Verification of CTMCs with Uncertain Rates
    Badings, Thom S.
    Jansen, Nils
    Junges, Sebastian
    Stoelinga, Marielle
    Volk, Matthias
    COMPUTER AIDED VERIFICATION (CAV 2022), PT II, 2022, 13372 : 26 - 47