Analysis of variance of cross-validation estimators of the generalization error

被引:0
|
作者
Markatou, M [1 ]
Tian, H
Biswas, S
Hripcsak, G
机构
[1] Columbia Univ, Dept Biostat, New York, NY 10032 USA
[2] Columbia Univ, Coll Biomed Informat, New York, NY 10032 USA
关键词
cross-validation; generalization error; moment approximation; prediction; variance estimation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper brings together methods from two different disciplines: statistics and machine learning. We address the problem of estimating the variance of cross-validation (CV) estimators of the generalization error. In particular, we approach the problem of variance estimation of the CV estimators of generalization error as a problem in approximating the moments of a statistic. The approximation illustrates the role of training and test sets in the performance of the algorithm. It provides a unifying approach to evaluation of various methods used in obtaining training and test sets and it takes into account the variability due to different training and test sets. For the simple problem of predicting the sample mean and in the case of smooth loss functions, we show that the variance of the CV estimator of the generalization error is a function of the moments of the random variables Y = Card(S-j boolean AND S-j') and Y* = Card(S-j(c)boolean AND S-j'(c)), where S-j, S-j' are two training sets, and S-j(c), S-j'(c) are the corresponding test sets. We prove that the distribution of Y and Y* is hypergeometric and we compare our estimator with the one proposed by Nadeau and Bengio (2003). We extend these results in the regression case and the case of absolute error loss, and indicate how the methods can be extended to the classification case. We illustrate the results through simulation.
引用
收藏
页码:1127 / 1168
页数:42
相关论文
共 50 条
  • [1] ANALYSIS AND ESTIMATION OF THE VARIANCE OF CROSS-VALIDATION ESTIMATORS OF THE GENERALIZATION ERROR: A SHORT REVIEW
    Markatou, Marianthi
    Dimova, Rositsa
    Sinha, Anshu
    [J]. FRONTIERS OF APPLIED AND COMPUTATIONAL MATHEMATICS, 2008, : 206 - +
  • [2] A Comparison of Estimators for the Variance of Cross-Validation Estimators of the Generalization Error of Computer Algorithms
    Markatou, Marianthi
    Dimova, Rositsa
    Sinha, Anshu
    [J]. NONPARAMETRIC STATISTICS AND MIXTURE MODELS: A FESTSCHRIFT IN HONOR OF THOMAS P HETTMANSPERGER, 2011, : 226 - 251
  • [3] Error estimation based on variance analysis of k-fold cross-validation
    Jiang, Gaoxia
    Wang, Wenjian
    [J]. PATTERN RECOGNITION, 2017, 69 : 94 - 106
  • [4] Quantification of the Impact of Feature Selection on the Variance of Cross-Validation Error Estimation
    Xiao, Yufei
    Hua, Jianping
    Dougherty, Edward R.
    [J]. EURASIP JOURNAL ON BIOINFORMATICS AND SYSTEMS BIOLOGY, 2007, (01):
  • [5] A THEORY OF CROSS-VALIDATION ERROR
    TURNEY, P
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 1994, 6 (04) : 361 - 391
  • [7] Probabilistic Cross-Validation Estimators for Gaussian Process Regression
    Martino, Luca
    Laparra, Valero
    Camps-Valls, Gustau
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 823 - 827
  • [8] COMPARATIVE ANALYSIS OF DIFFERENT CROSS-VALIDATION BANDWIDTH SELECTORS IN KERNEL REGRESSION ESTIMATORS
    Zhang, Yu-Min
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 502 - 509
  • [9] Corrected generalized cross-validation for finite ensembles of penalized estimators
    Bellec, Pierre C.
    Du, Jin-Hong
    Koriyama, Takuya
    Patil, Pratik
    Tan, Kai
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2024,
  • [10] Honest leave-one-out cross-validation for estimating post-tuning generalization error
    Wang, Boxiang
    Zou, Hui
    [J]. STAT, 2021, 10 (01):