Confidence bounds for sampling-based GROUP BY estimates

被引:5
|
作者
Xu, Fei [1 ]
Jermaine, Christopher [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 03期
关键词
algorithms; theory; reliability; approximate query processing; multiple hypothesis testing; sampling;
D O I
10.1145/1386118.1386122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sampling is now a very important data management tool, to such an extent that an interface for database sampling is included in the latest SQL standard. In this article we reconsider in depth what at first may seem like a very simple problem-computing the error of a sampling-based guess for the answer to a GROUP BY query over a multitable join. The difficulty when sampling for the answer to such a query is that the same sample will be used to guess the result of the query for each group, which induces correlations among the estimates. Thus, from a statistical point-of-view it is very problematic and even dangerous to use traditional methods such as confidence intervals for communicating estimate accuracy to the user. We explore ways to address this problem, and pay particular attention to the computational aspects of computing "safe" confidence intervals.
引用
收藏
页数:44
相关论文
共 50 条
  • [21] Asynchronous Sampling-Based Hybrid Equalizer
    Kocaman, Namik
    Green, Michael M.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (07) : 1014 - 1025
  • [22] Sampling-based Program Execution Monitoring
    Fischmeister, Sebastian
    Ba, Yanmeng
    ACM SIGPLAN NOTICES, 2010, 45 (04) : 133 - 142
  • [24] Estimates and confidence intervals for importance sampling sensitivity analysis
    Hesterberg, TC
    MATHEMATICAL AND COMPUTER MODELLING, 1996, 23 (8-9) : 79 - 85
  • [25] Using Capture-Recapture Methodology to Enhance Precision of Representative Sampling-Based Case Count Estimates
    Lyles, Robert H.
    Zhang, Yuzi
    Ge, Lin
    England, Cameron
    Ward, Kevin
    Lash, Timothy L.
    Waller, Lance A.
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2022, 10 (05) : 1292 - 1318
  • [26] Sampling-based estimators for subset-based queries
    Shantanu Joshi
    Christopher Jermaine
    The VLDB Journal, 2009, 18 : 181 - 202
  • [27] Sampling-Based MPC for Constrained Vision Based Control
    Mohamed, Ihab S.
    Allibert, Guillaume
    Martinet, Philippe
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3753 - 3758
  • [28] A Comprehensive Survey on Sampling-Based Image Matting
    Yao, Guilin
    Zhao, Zhijie
    Liu, Shaohui
    COMPUTER GRAPHICS FORUM, 2017, 36 (08) : 613 - 628
  • [29] Sampling-Based Methods for Motion Planning with Constraints
    Kingston, Zachary
    Moll, Mark
    Kavraki, Lydia E.
    ANNUAL REVIEW OF CONTROL, ROBOTICS, AND AUTONOMOUS SYSTEMS, VOL 1, 2018, 1 : 159 - 185
  • [30] Sampling-based estimators for subset-based queries
    Joshi, Shantanu
    Jermaine, Christopher
    VLDB JOURNAL, 2009, 18 (01): : 181 - 202