Distribution-balanced stratified cross-validation for accuracy estimation

被引:155
|
作者
Zeng, XC [1 ]
Martinez, TR [1 ]
机构
[1] Brigham Young Univ, Dept Comp Sci, Provo, UT 84602 USA
关键词
cross-validation; machine learning research; true accuracy; classifier;
D O I
10.1080/095281300146272
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called distribution-balanced stratified cross-validation (DBSCV), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested DBSCV on nine real-world and three artificial domains using the C4.5 decision trees classifier. The results show that DBSCV performs better (has smaller biases) than the regular stratified cross-validation in most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that DBSCV is particularly effective when multiple intraclass clusters exist in a data set.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [31] Statistical Fitting Criterion on the Basis of Cross-Validation Estimation
    Nedel’ko V.M.
    Pattern Recognition and Image Analysis, 2018, 28 (3) : 510 - 515
  • [32] Robust Likelihood Cross-Validation for Kernel Density Estimation
    Wu, Ximing
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2019, 37 (04) : 761 - 770
  • [33] Distribution-balanced augmentation for rough data driven object detection
    Wang Z.
    Tian L.
    Du Q.
    Sun Z.
    An Y.
    Liao W.
    Multimedia Tools and Applications, 2024, 83 (18) : 56103 - 56125
  • [34] On the consistency of cross-validation in nonlinear wavelet regression estimation
    Zhang, SL
    Zheng, ZG
    ACTA MATHEMATICA SCIENTIA, 2000, 20 (01) : 1 - 11
  • [35] Cross-validation for parameter selection in inverse estimation problems
    Dey, AK
    Ruymgaart, FH
    Mair, BA
    SCANDINAVIAN JOURNAL OF STATISTICS, 1996, 23 (04) : 609 - 620
  • [37] Estimation of Large Financial Covariances: A Cross-Validation Approach
    Tan, Vincent
    Zohren, Stefan
    JOURNAL OF PORTFOLIO MANAGEMENT, 2025, 51 (04): : 83 - 95
  • [38] Spatial cross-validation is not the right way to evaluate map accuracy
    Wadoux, Alexandre M. J-C
    Heuvelink, Gerard B. M.
    de Bruin, Sytze
    Brus, Dick J.
    ECOLOGICAL MODELLING, 2021, 457
  • [39] Dealing with clustered samples for assessing map accuracy by cross-validation
    de Bruin, Sytze
    Brus, Dick J.
    Heuvelink, Gerard B. M.
    Tengbergen, Tom van Ebbenhorst
    Wadoux, Alexandre M. J-C.
    ECOLOGICAL INFORMATICS, 2022, 69
  • [40] CROSS-VALIDATION METHOD AND DOMAIN OF ATTRACTION OF A GUMBEL DISTRIBUTION
    BOUCHAIR, MY
    COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE I-MATHEMATIQUE, 1995, 320 (02): : 237 - 240