Confidence bounds for sampling-based GROUP BY estimates

被引:5
|
作者
Xu, Fei [1 ]
Jermaine, Christopher [1 ]
Dobra, Alin [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2008年 / 33卷 / 03期
关键词
algorithms; theory; reliability; approximate query processing; multiple hypothesis testing; sampling;
D O I
10.1145/1386118.1386122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sampling is now a very important data management tool, to such an extent that an interface for database sampling is included in the latest SQL standard. In this article we reconsider in depth what at first may seem like a very simple problem-computing the error of a sampling-based guess for the answer to a GROUP BY query over a multitable join. The difficulty when sampling for the answer to such a query is that the same sample will be used to guess the result of the query for each group, which induces correlations among the estimates. Thus, from a statistical point-of-view it is very problematic and even dangerous to use traditional methods such as confidence intervals for communicating estimate accuracy to the user. We explore ways to address this problem, and pay particular attention to the computational aspects of computing "safe" confidence intervals.
引用
收藏
页数:44
相关论文
共 50 条
  • [31] Linguistic Steganography by Sampling-based Language Generation
    Yang, Rui
    Ling, Zhen-Hua
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1014 - 1019
  • [32] A sampling-based circuit for optimal decision making
    Buxo, Camille E. Rullan
    Savin, Cristina
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] Random Sampling-Based Relative Radiometric Normalization
    Bonnet, Wessel
    Celik, Turgay
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [34] Deep sequential models for sampling-based planning
    Kuo, Yen-Ling
    Barbu, Andrei
    Katz, Boris
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6490 - 6497
  • [35] Constrained Sampling-Based Planning for Grasping and Manipulation
    Huh, Jinwook
    Lee, Bhoram
    Lee, Daniel D.
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 223 - 230
  • [36] IS-MVSNet:Importance Sampling-Based MVSNet
    Wang, Likang
    Gong, Yue
    Ma, Xinjun
    Wang, Qirui
    Zhou, Kaixuan
    Chen, Lei
    COMPUTER VISION - ECCV 2022, PT XXXII, 2022, 13692 : 668 - 683
  • [37] PLASTR: Planning for Autonomous Sampling-Based Trowelling
    Kuhlmann-Jorgensen, Mads A.
    Pankert, Johannes
    Pietrasik, Lukasz L.
    Hutter, Marco
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 5069 - 5076
  • [38] Sampling-based algorithms for optimal motion planning
    Karaman, Sertac
    Frazzoli, Emilio
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2011, 30 (07): : 846 - 894
  • [39] SAMPLING-BASED APPROACHES TO CALCULATING MARGINAL DENSITIES
    GELFAND, AE
    SMITH, AFM
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (410) : 398 - 409
  • [40] A Sampling-Based Tool for Scaling Graph Datasets
    Musaafir, Ahmed
    Uta, Alexandru
    Dreuning, Henk
    Varbanescu, Ana-Lucia
    PROCEEDINGS OF THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE'20), 2020, : 289 - 300