Sample Amplification: Increasing Dataset Size even when Learning is Impossible

被引:0
|
作者
Axelrod, Brian [1 ]
Garg, Shivam [1 ]
Sharan, Vatsal [1 ]
Valiant, Gregory [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given data drawn from an unknown distribution, D, to what extent is it possible to "amplify" this dataset and faithfully output an even larger set of samples that appear to have been drawn from D? We formalize this question as follows: an (n;m) amplification procedure takes as input n independent draws from an unknown distribution D, and outputs a set of m > n "samples" which must be indistinguishable from m samples drawn iid from D. We consider this sample amplification problem in two fundamental settings: the case where D is an arbitrary discrete distribution supported on k elements, and the case where D is a d-dimensional Gaussian with unknown mean, and fixed covariance matrix. Perhaps surprisingly, we show a valid amplification procedure exists for both of these settings, even in the regime where the size of the input dataset, n, is significantly less than what would be necessary to learn distribution D to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we describe potential applications of sample amplification, and formalize a number of curious directions for future research.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Increasing the sample size when the unblinded interim result is promising
    Chen, YHJ
    DeMets, DL
    Lan, KKG
    [J]. STATISTICS IN MEDICINE, 2004, 23 (07) : 1023 - 1038
  • [2] Semi-Supervised Ensemble Learning for Expanding the Low Sample Size of Microarray Dataset
    Alrefai, Nashat
    Ibrahim, Othman
    [J]. INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 1140 - 1145
  • [3] Increasing statistical power without increasing sample size
    McClelland, GH
    [J]. AMERICAN PSYCHOLOGIST, 2000, 55 (08) : 963 - 964
  • [4] Consistent variable selection criteria in multivariate linear regression even when dimension exceeds sample size
    Oda, Ryoya
    [J]. HIROSHIMA MATHEMATICAL JOURNAL, 2020, 50 (03) : 339 - 374
  • [5] On the Deficiency of the Sample Median When the Sample Size is Random
    Bening, V. E.
    Korolev, Victor
    Zeifman, Alexander
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2016 (ICNAAM-2016), 2017, 1863
  • [6] Rebuilding sample distributions for small dataset learning
    Li, Der-Chiang
    Lin, Wu-Kuo
    Chen, Chien-Chih
    Chen, Hung-Yu
    Lin, Liang-Sian
    [J]. DECISION SUPPORT SYSTEMS, 2018, 105 : 66 - 76
  • [7] EFFECT OF INCREASING SAMPLE SIZE ON PRECISION OF AN ESTIMATOR
    AJGAONKAR, SGP
    [J]. AMERICAN STATISTICIAN, 1967, 21 (04): : 26 - 28
  • [8] Sample Size Growth with an Increasing Number of Comparisons
    Chi-Hong Tseng
    Yongzhao Shao
    [J]. JOURNAL OF PROBABILITY AND STATISTICS, 2012, 2012
  • [9] Is there an alternative to increasing the sample size in microarray studies?
    Klebanov, Lev
    Yakovlev, Andrei
    [J]. BIOINFORMATION, 2007, 1 (10) : 429 - 431
  • [10] When do GANs replicate? On the choice of dataset size
    Feng, Qianli
    Guo, Chenqi
    Benitez-Quiroz, Fabian
    Martinez, Aleix
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6681 - 6690