Penalized Regression Methods With Modified Cross-Validation and Bootstrap Tuning Produce Better Prediction Models

被引：1

作者：

Pavlou, Menelaos ^{[1
]}

Omar, Rumana Z. ^{[1
]}

Ambler, Gareth ^{[1
]}

机构：

[1] UCL, Dept Stat Sci, London, England

来源：

BIOMETRICAL JOURNAL | 2024年 / 66卷 / 05期

基金：

英国医学研究理事会;

关键词：

SHRINKAGE; LIKELIHOOD; SELECTION;

D O I：

10.1002/bimj.202300245

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Risk prediction models fitted using maximum likelihood estimation (MLE) are often overfitted resulting in predictions that are too extreme and a calibration slope (CS) less than 1. Penalized methods, such as Ridge and Lasso, have been suggested as a solution to this problem as they tend to shrink regression coefficients toward zero, resulting in predictions closer to the average. The amount of shrinkage is regulated by a tuning parameter, lambda,$\lambda ,$ commonly selected via cross-validation ("standard tuning"). Though penalized methods have been found to improve calibration on average, they often over-shrink and exhibit large variability in the selected lambda$\lambda $ and hence the CS. This is a problem, particularly for small sample sizes, but also when using sample sizes recommended to control overfitting. We consider whether these problems are partly due to selecting lambda$\lambda $ using cross-validation with "training" datasets of reduced size compared to the original development sample, resulting in an over-estimation of lambda$\lambda $ and, hence, excessive shrinkage. We propose a modified cross-validation tuning method ("modified tuning"), which estimates lambda$\lambda $ from a pseudo-development dataset obtained via bootstrapping from the original dataset, albeit of larger size, such that the resulting cross-validation training datasets are of the same size as the original dataset. Modified tuning can be easily implemented in standard software and is closely related to bootstrap selection of the tuning parameter ("bootstrap tuning"). We evaluated modified and bootstrap tuning for Ridge and Lasso in simulated and real data using recommended sample sizes, and sizes slightly lower and higher. They substantially improved the selection of lambda$\lambda $, resulting in improved CS compared to the standard tuning method. They also improved predictions compared to MLE.

引用

页数：11

共 50 条

[31] Cross-validation of component models: A critical look at current methods
R. Bro
K. Kjeldahl
A. K. Smilde
H. A. L. Kiers
[J]. Analytical and Bioanalytical Chemistry, 2008, 390 : 1241 - 1251
[32] EFFICIENT, ADAPTIVE CROSS-VALIDATION FOR TUNING AND COMPARING MODELS, WITH APPLICATION TO DRUG DISCOVERY
Shen, Hui
Welch, William J.
Hughes-Oliver, Jacqueline M.
[J]. ANNALS OF APPLIED STATISTICS, 2011, 5 (04): : 2668 - 2687
[33] SPATIAL CROSS-VALIDATION AND BOOTSTRAP FOR THE ASSESSMENT OF PREDICTION RULES IN REMOTE SENSING: THE R PACKAGE SPERROREST
Brenning, Alexander
[J]. 2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 5372 - 5375
[34] Cross-Validation of VO2peak Prediction Models in Adolescents
Burns, Ryan D.
Hannon, James C.
Brusseau, Timothy A.
Saint-Maurice, Pedro F.
Welk, Gregory J.
Mahar, Matthew
[J]. RESEARCH QUARTERLY FOR EXERCISE AND SPORT, 2015, 86 : A9 - A10
[35] Criterion for Evaluating the Predictive Ability of Nonlinear Regression Models without Cross-Validation
Kaneko, Hiromasa
Funatsu, Kimito
[J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2013, 53 (09) : 2341 - 2348
[36] Metabolizable energy in energy food for growing pigs and cross-validation regression models
Escocard de Oliveira, Newton Tavares
Pozza, Paulo Cesar
Castilha, Leandro Dalcin
Pasquetti, Tiago Junior
Langer, Carolina Natali
[J]. REVISTA CIENCIA AGRONOMICA, 2018, 49 (01): : 150 - 158
[37] COMPARISON OF SINGLE SAMPLE AND CROSS-VALIDATION METHODS FOR ESTIMATING MEAN SQUARED ERROR OF PREDICTION IN MULTIPLE LINEAR-REGRESSION
BROWNE, MW
[J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1975, 28 (MAY): : 112 - 120
[38] Shrinkage parameter selection via modified cross-validation approach for ridge regression model
Algamal, Zakariya Yahya
[J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2020, 49 (07) : 1922 - 1930
[39] A critical cross-validation of high throughput structural binding prediction methods for pMHC
Knapp, Bernhard
Omasits, Ulrich
Frantal, Sophie
Schreiner, Wolfgang
[J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2009, 23 (05) : 301 - 307
[40] A critical cross-validation of high throughput structural binding prediction methods for pMHC
Bernhard Knapp
Ulrich Omasits
Sophie Frantal
Wolfgang Schreiner
[J]. Journal of Computer-Aided Molecular Design, 2009, 23 : 301 - 307

← 1 2 3 4 5 →