Subsampling Suffices for Adaptive Data Analysis

被引:3
|
作者
Blanc, Guy [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
关键词
Adaptive data analysis; statistical queries; information theory;
D O I
10.1145/3564246.3585226
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of adaptive data analysis was formalized in the seminal works of Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014). We identify a remarkably simple set of assumptions under which the queries will continue to be representative even when chosen adaptively: The only requirements are that each query takes as input a random subsample and outputs few bits. This result shows that the noise inherent in subsampling is sufficient to guarantee that query responses generalize. The simplicity of this subsampling-based framework allows it to model a variety of real-world scenarios not covered by prior work. In addition to its simplicity, we demonstrate the utility of this framework by designing mechanisms for two foundational tasks, statistical queries and median finding. In particular, our mechanism for answering the broadly applicable class of statistical queries is both extremely simple and state of the art in many parameter regimes.
引用
收藏
页码:999 / 1012
页数:14
相关论文
共 50 条
  • [2] Bootstrapping Analysis of Lifetime Data with Subsampling
    Wang, Guodong
    Niu, Zhanwen
    Lv, Shanshan
    Qu, Liang
    He, Zhen
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2016, 32 (05) : 1945 - 1953
  • [3] Feature Screening for Massive Data Analysis by Subsampling
    Zhu, Xuening
    Pan, Rui
    Wu, Shuyuan
    Wang, Hansheng
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2022, 40 (04) : 1892 - 1903
  • [4] Adaptive data-driven subsampling for efficient neural network inference
    Machidon, Alina L.
    Pejovic, Veljko
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 5163 - 5171
  • [5] Reliability Data Analysis for Life Test Experiments with Subsampling
    Freeman, Laura J.
    Vining, G. Geoffrey
    JOURNAL OF QUALITY TECHNOLOGY, 2010, 42 (03) : 233 - 241
  • [6] A Decoder Suffices for Query-Adaptive Variational Inference
    Agarwal, Sakshi
    Hope, Gabriel
    Younis, Ali
    Sudderth, Erik B.
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 33 - 44
  • [7] SPATIALLY ADAPTIVE SUBSAMPLING OF IMAGE SEQUENCES
    BELFOR, RAF
    HESP, MPA
    LAGENDIJK, RL
    BIEMOND, J
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 1994, 3 (05) : 492 - 500
  • [8] Sampling and subsampling for cluster analysis in data mining: With applications to sky survey data
    Rocke, DM
    Dai, J
    DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (02) : 215 - 232
  • [9] Spatially Adaptive Subsampling for Motion Detection
    夏尔雷
    章毓晋
    Tsinghua Science and Technology, 2009, 14 (04) : 423 - 433
  • [10] Spatially Adaptive Subsampling for Motion Detection
    Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
    Tsinghua Sci. Tech., 2009, 4 (423-433):