Confidence bounds for sampling-based GROUP BY estimates

被引:5
|
作者
Xu, Fei [1 ]
Jermaine, Christopher [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 03期
关键词
algorithms; theory; reliability; approximate query processing; multiple hypothesis testing; sampling;
D O I
10.1145/1386118.1386122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sampling is now a very important data management tool, to such an extent that an interface for database sampling is included in the latest SQL standard. In this article we reconsider in depth what at first may seem like a very simple problem-computing the error of a sampling-based guess for the answer to a GROUP BY query over a multitable join. The difficulty when sampling for the answer to such a query is that the same sample will be used to guess the result of the query for each group, which induces correlations among the estimates. Thus, from a statistical point-of-view it is very problematic and even dangerous to use traditional methods such as confidence intervals for communicating estimate accuracy to the user. We explore ways to address this problem, and pay particular attention to the computational aspects of computing "safe" confidence intervals.
引用
收藏
页数:44
相关论文
共 50 条
  • [1] Sampling-based lower bounds for counting queries
    Gogate, Vibhav
    Dechter, Rina
    INTELLIGENZA ARTIFICIALE, 2011, 5 (02) : 171 - 188
  • [2] Submodular Approximation: Sampling-Based Algorithms and Lower Bounds
    Svitkina, Zoya
    Fleischer, Lisa
    PROCEEDINGS OF THE 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2008, : 697 - 706
  • [3] SUBMODULAR APPROXIMATION: SAMPLING-BASED ALGORITHMS AND LOWER BOUNDS
    Svitkina, Zoya
    Fleischer, Lisa
    SIAM JOURNAL ON COMPUTING, 2011, 40 (06) : 1715 - 1737
  • [4] Sampling-Based Estimates of the Sizes of Constrained Subcodes of Reed-Muller Codes
    Rameshwar, V. Arvind
    Jain, Shreyas
    Kashyap, Navin
    2024 NATIONAL CONFERENCE ON COMMUNICATIONS, NCC, 2024,
  • [5] CONFIDENCE-BOUNDS ON ESTIMATES OF THE WEIBULL PARAMETERS
    JOHNSON, CA
    AMERICAN CERAMIC SOCIETY BULLETIN, 1980, 59 (08): : 825 - 825
  • [6] CONFIDENCE BOUNDS FOR MAGNITUDE-SQUARED COHERENCE ESTIMATES
    SCANNELL, EH
    CARTER, GC
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (05): : 475 - 477
  • [7] Group Marching Tree: Sampling-Based Approximately Optimal Motion Planning on GPUs
    Ichter, Brian
    Schmerling, Edward
    Pavone, Marco
    2017 FIRST IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC), 2017, : 219 - 226
  • [8] Sampling-based Path Planning with Goal Oriented Sampling
    Kang, Gitae
    Kim, Yong Bum
    You, Won Suk
    Lee, Young Hun
    Oh, Hyun Seok
    Moon, Hyungpil
    Choi, Hyouk Ryeol
    2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2016, : 1285 - 1290
  • [9] Bayesian Local Sampling-Based Planning
    Lai, Tin
    Morere, Philippe
    Ramos, Fabio
    Francis, Gilad
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02): : 1954 - 1961
  • [10] Generalized Sampling-Based Motion Planners
    Chakravorty, Suman
    Kumar, Sandip
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (03): : 855 - 866