The uncertainty principle of cross-validation

被引:6
|
作者
Last, Mark [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Informat Syst Eng, IL-84105 Beer Sheva, Israel
关键词
cross-validation; accuracy estimation; model selection; classification; info-fuzzy networks;
D O I
10.1109/GRC.2006.1635796
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data miners have often to deal with data sets of limited size due to economic, timing and other constraints. Usually their task is two-fold: to induce the most accurate model from a given dataset and to estimate the model's accuracy on future (unseen) examples. Cross-validation is the most common approach to estimating the true accuracy of a given model and it is based on splitting the available sample between a training set and a validation set. The practical experience shows that any cross-validation method suffers from either an optimistic or a pessimistic bias in some domains. In this paper, we present a series of large-scale experiments on artificial and real-world datasets, where we study the relationship between the model's true accuracy and its cross-validation estimator. Two stable classification algorithms (ID3 and info-fuzzy network) are used for inducing each model. The results of our experiments have a striking resemblance to the well-known Heisenberg Uncertainty Principle: the more accurate is a model induced from a small amount of real-world data, the less reliable are the values of simultaneously measured cross-validation estimates. We suggest calling this phenomenon "the uncertainty principle of cross-validation".
引用
收藏
页码:275 / 280
页数:6
相关论文
共 50 条
  • [1] PPP-RTK considering the ionosphere uncertainty with cross-validation
    Li, Pan
    Cui, Bobin
    Hu, Jiahuan
    Liu, Xuexi
    Zhang, Xiaohong
    Ge, Maorong
    Schuh, Harald
    [J]. SATELLITE NAVIGATION, 2022, 3 (01):
  • [2] PPP-RTK considering the ionosphere uncertainty with cross-validation
    Pan Li
    Bobin Cui
    Jiahuan Hu
    Xuexi Liu
    Xiaohong Zhang
    Maorong Ge
    Harald Schuh
    [J]. Satellite Navigation, 3
  • [3] Fast Cross-Validation
    Liu, Yong
    Lin, Hailun
    Ding, Lizhong
    Wang, Weiping
    Liao, Shizhong
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2497 - 2503
  • [4] CROSS-VALIDATION FOR PREDICTION
    COOIL, B
    WINER, RS
    RADOS, DL
    [J]. JOURNAL OF MARKETING RESEARCH, 1987, 24 (03) : 271 - 279
  • [5] SMOOTHED CROSS-VALIDATION
    HALL, P
    MARRON, JS
    PARK, BU
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 1992, 92 (01) : 1 - 20
  • [6] PARAMETERS OF CROSS-VALIDATION
    HERZBERG, PA
    [J]. PSYCHOMETRIKA, 1969, 34 (2P2) : 1 - &
  • [7] Cross-validation methods
    Browne, MW
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2000, 44 (01) : 108 - 132
  • [8] Targeted cross-validation
    Zhang, Jiawei
    Ding, Jie
    Yang, Yuhong
    [J]. BERNOULLI, 2023, 29 (01) : 377 - 402
  • [9] Cross-Validation With Confidence
    Lei, Jing
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (532) : 1978 - 1997
  • [10] Cross-validation Revisited
    Dutta, Santanu
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2016, 45 (02) : 472 - 490