Estimation of predictive performance in high-dimensional data settings using learning curves

被引:0
|
作者
Goedhart, Jeroen M. [1 ]
Klausch, Thomas [1 ]
van de Wiel, Mark A. [1 ]
机构
[1] Amsterdam Univ Med Ctr, Amsterdam Publ Hlth Res Inst, Dept Epidemiol & Data Sci, De Boelelaan 1117, NL-1081 HV Amsterdam, Netherlands
关键词
High-dimensional data; Omics; Predictive performance; Area under the receiver operating curve; Bootstrap; Cross-validation; CROSS-VALIDATION; ERROR RATE; AREA; CLASSIFICATION; SIGNATURES; CANCER; SIZE;
D O I
10.1016/j.csda.2022.107622
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In high-dimensional prediction settings, it remains challenging to reliably estimate the test performance. To address this challenge, a novel performance estimation framework is presented. This framework, called Learn2Evaluate, is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size. Learn2Evaluate has several advantages compared to commonly applied performance estimation methodologies. Firstly, a learning curve offers a graphical overview of a learner. This overview assists in assessing the potential benefit of adding training samples and it provides a more complete comparison between learners than performance estimates at a fixed subsample size. Secondly, a learning curve facilitates in estimating the performance at the total sample size rather than a subsample size. Thirdly, Learn2Evaluate allows the computation of a theoretically justified and useful lower confidence bound. Furthermore, this bound may be tightened by performing a bias correction. The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Adaptive Lasso in high-dimensional settings
    Lin, Zhengyan
    Xiang, Yanbiao
    Zhang, Caiya
    JOURNAL OF NONPARAMETRIC STATISTICS, 2009, 21 (06) : 683 - 696
  • [42] ordinalgmifs: An R Package for Ordinal Regression in High-dimensional Data Settings
    Archer, Kellie J.
    Hou, Jiayi
    Zhou, Qing
    Ferber, Kyle
    Layne, John G.
    Gentry, Amanda E.
    CANCER INFORMATICS, 2014, 13 : 187 - 195
  • [43] Development of fragility curves using high-dimensional model representation
    Unnikrishnan, V. U.
    Prasad, A. M.
    Rao, B. N.
    EARTHQUAKE ENGINEERING & STRUCTURAL DYNAMICS, 2013, 42 (03): : 419 - 430
  • [44] Prediction of vancomycin dose on high-dimensional data using machine learning techniques
    Huang, Xiaohui
    Yu, Ze
    Wei, Xin
    Shi, Junfeng
    Wang, Yu
    Wang, Zeyuan
    Chen, Jihui
    Bu, Shuhong
    Li, Lixia
    Gao, Fei
    Zhang, Jian
    Xu, Ajing
    EXPERT REVIEW OF CLINICAL PHARMACOLOGY, 2021, 14 (06) : 761 - 771
  • [45] Flexible High-Dimensional Unsupervised Learning with Missing Data
    Wei, Yuhong
    Tang, Yang
    McNicholas, Paul D.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 610 - 621
  • [46] Online Markov Blanket Learning for High-Dimensional Data
    Zhaolong Ling
    Bo Li
    Yiwen Zhang
    Ying Li
    Haifeng Ling
    Applied Intelligence, 2023, 53 : 5977 - 5997
  • [47] Efficient Data Structures for Density Estimation for Large High-Dimensional Data
    Majdara, Aref
    Nooshabadi, Saeid
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2017,
  • [48] Efficient Sparse Representation for Learning With High-Dimensional Data
    Chen, Jie
    Yang, Shengxiang
    Wang, Zhu
    Mao, Hua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4208 - 4222
  • [49] Scalable collaborative targeted learning for high-dimensional data
    Ju, Cheng
    Gruber, Susan
    Lendle, Samuel D.
    Chambaz, Antoine
    Franklin, Jessica M.
    Wyss, Richard
    Schneeweiss, Sebastian
    van der Laan, Mark J.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (02) : 532 - 554
  • [50] Learning in high-dimensional multimedia data: the state of the art
    Gao, Lianli
    Song, Jingkuan
    Liu, Xingyi
    Shao, Junming
    Liu, Jiajun
    Shao, Jie
    MULTIMEDIA SYSTEMS, 2017, 23 (03) : 303 - 313