Estimation of predictive performance in high-dimensional data settings using learning curves

被引:0
|
作者
Goedhart, Jeroen M. [1 ]
Klausch, Thomas [1 ]
van de Wiel, Mark A. [1 ]
机构
[1] Amsterdam Univ Med Ctr, Amsterdam Publ Hlth Res Inst, Dept Epidemiol & Data Sci, De Boelelaan 1117, NL-1081 HV Amsterdam, Netherlands
关键词
High-dimensional data; Omics; Predictive performance; Area under the receiver operating curve; Bootstrap; Cross-validation; CROSS-VALIDATION; ERROR RATE; AREA; CLASSIFICATION; SIGNATURES; CANCER; SIZE;
D O I
10.1016/j.csda.2022.107622
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Model-assisted estimation in high-dimensional settings for survey data
    Dagdoug, Mehdi
    Goga, Camelia
    Haziza, David
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (03) : 761 - 785
  • [2] An Alternative Prior for Estimation in High-Dimensional Settings
    Nagel, Michael
    Fischer, Lukas
    Pawlowski, Tim
    Kelava, Augustin
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2024, : 939 - 951
  • [3] Learning high-dimensional data
    Verleysen, M
    LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 141 - 162
  • [4] Learning high-dimensional multimedia data
    Xiaofeng Zhu
    Zhi Jin
    Rongrong Ji
    Multimedia Systems, 2017, 23 : 281 - 283
  • [5] Learning to visualise high-dimensional data
    Ahmad, K
    Vrusias, B
    EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2004, : 507 - 512
  • [6] Learning high-dimensional multimedia data
    Zhu, Xiaofeng
    Jin, Zhi
    Ji, Rongrong
    MULTIMEDIA SYSTEMS, 2017, 23 (03) : 281 - 283
  • [7] Visualization of high-dimensional data via orthogonal curves
    García-Osorio, C
    Fyfe, C
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2005, 11 (11) : 1806 - 1819
  • [8] Improved Estimation of High-dimensional Additive Models Using Subspace Learning
    He, Shiyuan
    He, Kejun
    Huang, Jianhua Z.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2022, 31 (03) : 866 - 876
  • [9] PERFORMANCE OF MACHINE LEARNING METHODS IN CLASSIFICATION MODELS WITH HIGH-DIMENSIONAL DATA
    Zekic-Susac, Marijana
    Pfeifer, Sanja
    Sarlija, Natasa
    SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 219 - 224
  • [10] State estimation from high-dimensional data
    Solo, V
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING SIGNAL PROCESSING THEORY AND METHODS, 2004, : 685 - 688