MISS: finding optimal sample sizes for approximate analytics

被引:0
|
作者
Xuebin Su
Hongzhi Wang
机构
[1] Harbin Institute of Technology & Peng Cheng Lab,
来源
关键词
OLAP; Approximate Query Processing; Sampling; Bootstrapping; Optimization;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, sampling-based Approximate Query Processing (AQP) is widely regarded as a promising way to achieve interactivity in big data analytics. To build such an AQP system, finding the minimal sample size for a query regarding given error constraints in general, called Sample Size Optimization (SSO), is an essential yet unsolved problem. Ideally, the goal of solving the SSO problem is to achieve statistical accuracy, computational efficiency and broad applicability all at the same time. Existing approaches either make idealistic assumptions on the statistical properties of the query, or completely disregard them. This may result in overemphasizing only one of the three goals while neglect the others. To overcome these limitations, we first examine carefully the statistical properties shared by common analytical queries. Then, based on the properties, we propose a linear model describing the relationship between sample sizes and the approximation errors of a query, which is called the error model. Then, we propose a Model-guided Iterative Sample Selection (MISS) framework to solve the SSO problem generally. Afterwards, based on the MISS framework, we propose a concrete algorithm, called L2MISS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^{2}\textsc{Miss}$$\end{document}, to find optimal sample sizes under the L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^{2}$$\end{document} norm error metric. Moreover, we extend the L2MISS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^{2}\textsc{Miss}$$\end{document} algorithm to handle other error metrics. Finally, we show theoretically and empirically that the L2MISS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^{2}\textsc{Miss}$$\end{document} algorithm and its extensions achieve satisfactory accuracy and efficiency for a considerably wide range of analytical queries.
引用
收藏
页码:165 / 200
页数:35
相关论文
共 50 条
  • [31] ON SAMPLE SIZES
    WERDIER, D
    [J]. BIOMETRICS, 1984, 40 (01) : 266 - 266
  • [32] Optimal and maximin sample sizes for multicentre cost-effectiveness trials
    Manju, Md Abu
    Candel, Math J. J. M.
    Berger, Martijn P. F.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2015, 24 (05) : 513 - 539
  • [33] A normative inference approach for optimal sample sizes in decisions from experience
    Ostwald, Dirk
    Starke, Ludger
    Hertwig, Ralph
    [J]. FRONTIERS IN PSYCHOLOGY, 2015, 6
  • [34] SAMPLE SIZES
    MAINLAND, D
    HERRERA, L
    [J]. METHODS IN MEDICAL RESEARCH, 1954, 6 (03) : 201 - 208
  • [35] APPROXIMATE CALCULATION OF THE SWITCHING VOLTAGE OF MISS
    ZOLOMY, I
    [J]. PHYSICA STATUS SOLIDI A-APPLIED RESEARCH, 1989, 111 (01): : 371 - 375
  • [36] StatsReduce in the cloud for Approximate Analytics
    de Rougemont, Michel
    [J]. 2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 593 - 599
  • [37] Abundance of sea lice larvae in plankton samples: determination of optimal sample sizes
    Fernandez-Gonzalez, V.
    Ulvan, E. M.
    Sanchez-Jerez, P.
    Diserud, O. H.
    Toledo-Guedes, K.
    Casado-Coy, N.
    Klebert, P.
    Uglem, I.
    [J]. AQUACULTURE, 2022, 551
  • [39] Abundance of sea lice larvae in plankton samples: determination of optimal sample sizes
    Fernandez-Gonzalez, V
    Ulvan, E. M.
    Sanchez-Jerez, P.
    Diserud, O. H.
    Toledo-Guedes, K.
    Casado-Coy, N.
    Klebert, P.
    Uglem, I
    [J]. AQUACULTURE, 2022, 551
  • [40] Finding Miss Nesbit An Imagined Biography
    Helfand, Jessica
    [J]. ARCHIVES OF AMERICAN ART JOURNAL, 2009, 48 (1-2) : 66 - 77