The Analytical Bootstrap: a New Method for Fast Error Estimation in Approximate Query Processing

被引:47
|
作者
Zeng, Kai [1 ]
Gao, Shi [1 ]
Mozafari, Barzan [2 ]
Zaniolo, Carlo [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
Approximate Query Processing; Error Estimation; Bootstrap; UNCERTAIN; AGGREGATION;
D O I
10.1145/2588555.2588579
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sampling is one of the most commonly used techniques in Approximate Query Processing (AQP)-an area of research that is now made more critical by the need for timely and cost-effective analytics over "Big Data". Assessing the quality (i.e., estimating the error) of approximate answers is essential for meaningful AQP, and the two main approaches used in the past to address this problem are based on either (i) analytic error quantification or (ii) the boot-strap method. The first approach is extremely efficient but lacks generality, whereas the second is quite general but suffers from its high computational overhead. In this paper, we introduce a probabilistic relational model for the bootstrap process, along with rigorous semantics and a unified error model, which bridges the gap between these two traditional approaches. Based on our probabilistic framework, we develop efficient algorithms to predict the distribution of the approximation results. These enable the computation of any bootstrap-based quality measure for a large class of SQL queries via a single-round evaluation of a slightly modified query. Extensive experiments on both synthetic and real-world datasets show that our method has superior prediction accuracy for bootstrap-based quality measures, and is several orders of magnitude faster than bootstrap.
引用
收藏
页码:277 / 288
页数:12
相关论文
共 50 条
  • [1] Revisiting Approximate Query Processing and Bootstrap Error Estimation on GPU
    Zhao, Hang
    Zhang, Hanbing
    Jing, Yinan
    Zhang, Kai
    He, Zhenying
    Wang, X. Sean
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT I, 2022, : 72 - 87
  • [2] Approximate Query Processing with Error Guarantees
    Ni, Tianjia
    Sugiura, Kento
    Ishikawa, Yoshiharu
    Lu, Kejing
    [J]. BIG-DATA-ANALYTICS IN ASTRONOMY, SCIENCE, AND ENGINEERING, BDA 2021, 2022, 13167 : 268 - 278
  • [3] Approximate Query Processing: What is New and Where to Go?: A Survey on Approximate Query Processing
    Li, Kaiyu
    Li, Guoliang
    [J]. DATA SCIENCE AND ENGINEERING, 2018, 3 (04) : 379 - 397
  • [4] A Histogram based Analytical Approximate Query Processing for Massive Data
    Wang, Yijun
    Wang, Hanhu
    Li, Hui
    [J]. INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 362 - 365
  • [5] DAQ: A New Paradigm for Approximate Query Processing
    Potti, Navneet
    Patel, Jignesh M.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (09): : 898 - 909
  • [6] A bootstrap method for error estimation in randomized matrix multiplication
    Lopes, Miles E.
    Wang, Shusen
    Mahoney, Michael W.
    [J]. Journal of Machine Learning Research, 2019, 20
  • [7] A Bootstrap Method for Error Estimation in Randomized Matrix Multiplication
    Lopes, Miles E.
    Wang, Shusen
    Mahoney, Michael W.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [8] An Online Approximate Aggregation Query Processing Method Based on Hadoop
    Zhang, Zhiqiang
    Hu, Jianghua
    Xie, Xiaoqin
    Pan, Haiwei
    Feng, Xiaoning
    [J]. 2016 IEEE 20TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2016, : 117 - 122
  • [9] Statistical error estimation of the Feynman- method using the bootstrap method
    Endo, Tomohiro
    Yamamoto, Akio
    Yagi, Takahiro
    Pyeon, Cheol Ho
    [J]. JOURNAL OF NUCLEAR SCIENCE AND TECHNOLOGY, 2016, 53 (09) : 1447 - 1453
  • [10] A New Approximate Analytical Method for ODEs
    Aminikhah, Hossein
    [J]. JOURNAL OF APPLIED MATHEMATICS STATISTICS AND INFORMATICS, 2014, 10 (01) : 19 - 30