Data splitting strategies for reducing the effect of model selection on inference

被引：0

作者：

Faraway, JJ ^{[1
]}

机构：

[1] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA

来源：

DIMENSION REDUCTION, COMPUTATIONAL COMPLEXITY AND INFORMATION | 1998年 / 30卷

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

When an appropriate model for data is not completely known, the data is often used to select a model. Very often inference is then made from the selected model assuming that it had been known from the beginning. Estimates of the error of predictions or other quantities associated with that model take account of the uncertainty about the parameters of the model, but not the uncertainty about the model itself. Such error estimates tend to be too small, especially when the model uncertainty dominates the parametric uncertainty. Models are usually selected on the basis of fit, so typically the data fit the selected model rather well thus making the error seem small, In data splitting, one part of the data is used solely for model selection and the other part for inference thus hopefully avoiding the over-optimism induced by using the same data to both select and estimate the parameters of a model. Data splitting is easy to implement and thus is an attractive alternative to complex methods of adjusting for the effect of model selection on inference. Three tasks need to be performed - model selection, prediction and error assessment. We investigate different strategies for allotting the two parts of the data between these three tasks. We devise a new graphical method for jointly assessing prediction accuracy and error estimates called an honesty plot. The plot can be used to show actual coverage of confidence intervals of any given nominal level. Variable selection, Box-Cox transformation and more complex simulation experiments are used to compare the various strategies. The performance of data-splitting is found to be no better than using all the data for both selection and inference.

引用

页码：332 / 341

页数：10

共 50 条

[1] Splitting strategies for post-selection inference
Rasines, D. Garcia
Young, G. A.
BIOMETRIKA, 2023, 110 (03) : 597 - 614
[2] THE ROLE OF MODEL SELECTION IN CAUSAL INFERENCE FROM NONEXPERIMENTAL DATA
ROBINS, JM
GREENLAND, S
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1986, 123 (03) : 392 - 402
[3] Inference for probability of selection with dependently truncated data using a Cox model
Zhang, Xu
Li, Ji
Liu, Yang
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (03) : 1944 - 1957
[4] Inference after model selection
Shen, XT
Huang, HC
Ye, J
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (467) : 751 - 762
[5] STATISTICAL-INFERENCE, MODEL SELECTION AND RESEARCH EXPERIENCE - A MULTINOMIAL MODEL OF DATA MINING
MARQUEZ, J
SHACKMARQUEZ, J
WASCHER, WL
ECONOMICS LETTERS, 1985, 18 (01) : 39 - 44
[6] EFFECTS OF MODEL SELECTION ON INFERENCE
POTSCHER, BM
ECONOMETRIC THEORY, 1991, 7 (02) : 163 - 185
[7] Post-selection inference for the Cox model with interval-censored data
Zhang, Jianrui
Li, Chenxi
Weng, Haolei
SCANDINAVIAN JOURNAL OF STATISTICS, 2025,
[8] Model selection and semiparametric inference for bivariate failure-time data - Rejoinder
Wang, WJ
Wells, MT
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2000, 95 (449) : 75 - 76
[9] On model selection and model misspecification in causal inference
Vansteelandt, Stijn
Bekaert, Maarten
Claeskens, Gerda
STATISTICAL METHODS IN MEDICAL RESEARCH, 2012, 21 (01) : 7 - 30
[10] Model selection and semiparametric inference for bivariate failure-time data -: Comment
Peña, EA
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2000, 95 (449) : 73 - 75

← 1 2 3 4 5 →