An Improved Cross-Validated Adversarial Validation Method

被引:0
|
作者
Zhang, Wen [1 ]
Liu, Zhengjiang [1 ]
Xue, Yan [2 ]
Wang, Ruibo [3 ]
Cao, Xuefei [1 ]
Li, Jihong [3 ]
机构
[1] Shanxi Univ, Sch Automat & Software Engn, Taiyuan 030006, Peoples R China
[2] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Peoples R China
[3] Shanxi Univ, Sch Modern Educ Technol, Taiyuan 030006, Peoples R China
关键词
Adversarial Validation; Cross Validation; Algorithm Comparison; Significance Testing; Distribution Shift; DATASET SHIFT; TESTS;
D O I
10.1007/978-3-031-40283-8_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a widely-used strategy among Kaggle competitors, adversarial validation provides a novel selection framework of a reasonable training and validation sets. An adversarial validation heavily depends on an accurate identification of the difference between the distributions of the training and test sets released in a Kaggle competition. However, the typical adversarial validation merely uses a K-fold cross-validated point estimator to measure the difference regardless of the variation of the estimator. Therefore, the typical adversarial validation tends to produce unpromising false positive conclusions. In this study, we reconsider the adversarial validation from a perspective of algorithm comparison. Specifically, we formulate the adversarial validation into a comparison task of a well-trained classifier with a random-guessing classifier on an adversarial data set. Then, we investigate the state-of-the-art algorithm comparison methods to improve the adversarial validation method for reducing false positive conclusions. We conducted sufficient simulated and real-world experiments, and we showed the recently-proposed 5 x 2 BCV McNemar's test can significantly improve the performance of the adversarial validation method.
引用
收藏
页码:343 / 353
页数:11
相关论文
共 50 条
  • [31] GENETIC EPIDEMIOLOGY OF VITILIGO - MULTILOCUS RECESSIVITY CROSS-VALIDATED
    NATH, SK
    MAJUMDER, PP
    NORDLUND, JJ
    AMERICAN JOURNAL OF HUMAN GENETICS, 1994, 55 (05) : 981 - 990
  • [32] The cross-validated adaptive epsilon-net estimator
    van der Laan, Mark J.
    Dudoit, Sandrine
    van der Vaart, Aadw.
    STATISTICS & RISK MODELING, 2006, 24 (03) : 373 - 395
  • [33] OPTIMAL ESTIMATION OF CONTOUR PROPERTIES BY CROSS-VALIDATED REGULARIZATION
    SHAHRARAY, B
    ANDERSON, DJ
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (06) : 600 - 610
  • [34] SOURCES OF IMPRECISION IN FORMULA CROSS-VALIDATED MULTIPLE CORRELATIONS
    LAUTENSCHLAGER, GJ
    JOURNAL OF APPLIED PSYCHOLOGY, 1990, 75 (04) : 460 - 462
  • [35] BMT: A Cross-Validated ThinPrep Pap Cervical Cytology Dataset for Machine Learning Model Training and Validation
    E. Celeste Welch
    Chenhao Lu
    C. James Sung
    Cunxian Zhang
    Anubhav Tripathi
    Joyce Ou
    Scientific Data, 11 (1)
  • [36] Developing a predictive signature for two trial endpoints using the cross-validated risk scores method
    Cherlin, Svetlana
    Wason, James M. S.
    BIOSTATISTICS, 2023, 24 (02) : 327 - 344
  • [37] CROSS-VALIDATED STUDY OF CARIES PREVENTIVE EFFECT OF DENTAL FLOSSING
    WRIGHT, GZ
    BANTING, DW
    FEASBY, WH
    JOURNAL OF DENTAL RESEARCH, 1977, 56 : A84 - A84
  • [38] A DEGENERACY IN CROSS-VALIDATED SKILL IN REGRESSION-BASED FORECASTS
    BARNSTON, AG
    VANDENDOOL, HM
    JOURNAL OF CLIMATE, 1993, 6 (05) : 963 - 977
  • [39] CONSISTENCY FOR CROSS-VALIDATED NEAREST NEIGHBOR ESTIMATES IN NONPARAMETRIC REGRESSION
    LI, KC
    ANNALS OF STATISTICS, 1984, 12 (01): : 230 - 240
  • [40] Cross-validated Cox regression on microarray gene expression data
    van Houwelingen, Hans C.
    Bruinsma, Tako
    Hart, Augustinus A. M.
    van't Veet, Laura J.
    Wessels, Lodewyk F. A.
    STATISTICS IN MEDICINE, 2006, 25 (18) : 3201 - 3216