Penalized Regression Methods With Modified Cross-Validation and Bootstrap Tuning Produce Better Prediction Models

被引:1
|
作者
Pavlou, Menelaos [1 ]
Omar, Rumana Z. [1 ]
Ambler, Gareth [1 ]
机构
[1] UCL, Dept Stat Sci, London, England
基金
英国医学研究理事会;
关键词
SHRINKAGE; LIKELIHOOD; SELECTION;
D O I
10.1002/bimj.202300245
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Risk prediction models fitted using maximum likelihood estimation (MLE) are often overfitted resulting in predictions that are too extreme and a calibration slope (CS) less than 1. Penalized methods, such as Ridge and Lasso, have been suggested as a solution to this problem as they tend to shrink regression coefficients toward zero, resulting in predictions closer to the average. The amount of shrinkage is regulated by a tuning parameter, lambda,$\lambda ,$ commonly selected via cross-validation ("standard tuning"). Though penalized methods have been found to improve calibration on average, they often over-shrink and exhibit large variability in the selected lambda$\lambda $ and hence the CS. This is a problem, particularly for small sample sizes, but also when using sample sizes recommended to control overfitting. We consider whether these problems are partly due to selecting lambda$\lambda $ using cross-validation with "training" datasets of reduced size compared to the original development sample, resulting in an over-estimation of lambda$\lambda $ and, hence, excessive shrinkage. We propose a modified cross-validation tuning method ("modified tuning"), which estimates lambda$\lambda $ from a pseudo-development dataset obtained via bootstrapping from the original dataset, albeit of larger size, such that the resulting cross-validation training datasets are of the same size as the original dataset. Modified tuning can be easily implemented in standard software and is closely related to bootstrap selection of the tuning parameter ("bootstrap tuning"). We evaluated modified and bootstrap tuning for Ridge and Lasso in simulated and real data using recommended sample sizes, and sizes slightly lower and higher. They substantially improved the selection of lambda$\lambda $, resulting in improved CS compared to the standard tuning method. They also improved predictions compared to MLE.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Cross-validation of component models: A critical look at current methods
    R. Bro
    K. Kjeldahl
    A. K. Smilde
    H. A. L. Kiers
    [J]. Analytical and Bioanalytical Chemistry, 2008, 390 : 1241 - 1251
  • [32] EFFICIENT, ADAPTIVE CROSS-VALIDATION FOR TUNING AND COMPARING MODELS, WITH APPLICATION TO DRUG DISCOVERY
    Shen, Hui
    Welch, William J.
    Hughes-Oliver, Jacqueline M.
    [J]. ANNALS OF APPLIED STATISTICS, 2011, 5 (04): : 2668 - 2687
  • [33] SPATIAL CROSS-VALIDATION AND BOOTSTRAP FOR THE ASSESSMENT OF PREDICTION RULES IN REMOTE SENSING: THE R PACKAGE SPERROREST
    Brenning, Alexander
    [J]. 2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 5372 - 5375
  • [34] Cross-Validation of VO2peak Prediction Models in Adolescents
    Burns, Ryan D.
    Hannon, James C.
    Brusseau, Timothy A.
    Saint-Maurice, Pedro F.
    Welk, Gregory J.
    Mahar, Matthew
    [J]. RESEARCH QUARTERLY FOR EXERCISE AND SPORT, 2015, 86 : A9 - A10
  • [35] Criterion for Evaluating the Predictive Ability of Nonlinear Regression Models without Cross-Validation
    Kaneko, Hiromasa
    Funatsu, Kimito
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (09) : 2341 - 2348
  • [36] Metabolizable energy in energy food for growing pigs and cross-validation regression models
    Escocard de Oliveira, Newton Tavares
    Pozza, Paulo Cesar
    Castilha, Leandro Dalcin
    Pasquetti, Tiago Junior
    Langer, Carolina Natali
    [J]. REVISTA CIENCIA AGRONOMICA, 2018, 49 (01): : 150 - 158
  • [37] COMPARISON OF SINGLE SAMPLE AND CROSS-VALIDATION METHODS FOR ESTIMATING MEAN SQUARED ERROR OF PREDICTION IN MULTIPLE LINEAR-REGRESSION
    BROWNE, MW
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1975, 28 (MAY): : 112 - 120
  • [38] Shrinkage parameter selection via modified cross-validation approach for ridge regression model
    Algamal, Zakariya Yahya
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2020, 49 (07) : 1922 - 1930
  • [39] A critical cross-validation of high throughput structural binding prediction methods for pMHC
    Knapp, Bernhard
    Omasits, Ulrich
    Frantal, Sophie
    Schreiner, Wolfgang
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2009, 23 (05) : 301 - 307
  • [40] A critical cross-validation of high throughput structural binding prediction methods for pMHC
    Bernhard Knapp
    Ulrich Omasits
    Sophie Frantal
    Wolfgang Schreiner
    [J]. Journal of Computer-Aided Molecular Design, 2009, 23 : 301 - 307