Does data splitting improve prediction?

被引：17

作者：

Faraway, Julian J. ^{[1
]}

机构：

[1] Univ Bath, Dept Math Sci, Bath BA2 7AY, Avon, England

来源：

STATISTICS AND COMPUTING | 2016年 / 26卷 / 1-2期

关键词：

Cross-validation; Model assessment; Model uncertainty; Model validation; Prediction; Scoring; MODEL SELECTION; VALIDATION; ERROR;

D O I：

10.1007/s11222-014-9522-9

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the problem of constructing reliable predictive distributions for future observed values. We judge the predictive performance using log scoring. We compare the full data strategy with the data splitting strategy for prediction. We show how the full data score can be decomposed into model selection, parameter estimation and data reuse costs. Data splitting is preferred when data reuse costs are high. We investigate the relative performance of the strategies in four simulation scenarios. We introduce a hybrid estimator that uses one part for model selection but both parts for estimation. We argue that a split data analysis is prefered to a full data analysis for prediction with some exceptions.

引用

页码：49 / 60

页数：12

共 50 条

[1] Does data splitting improve prediction?
Julian J. Faraway
Statistics and Computing, 2016, 26 : 49 - 60
[2] IMPROVE PREDICTION WITH DATA RECONCILIATION
LEIBOVICI, CF
VERNEUIL, VS
YANG, P
HYDROCARBON PROCESSING, 1993, 72 (10): : 79 - 80
[3] Does integration of '-omics' data with traditional endpoints improve our understanding and prediction of toxicity?
Hamadeh, HK
TOXICOLOGY, 2004, 202 (1-2) : 37 - 37
[4] Advanced imaging: Does it improve outcome prediction?
Huppi, P.
EUROPEAN JOURNAL OF PEDIATRICS, 2016, 175 (11) : 1401 - 1402
[5] Validation in prediction research: the waste by data splitting COMMENTARY
Steyerberg, Ewout W.
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2018, 103 : 131 - 133
[6] Does the inclusion of rare variants improve risk prediction?
Erin Austin
Wei Pan
Xiaotong Shen
BMC Proceedings, 8 (Suppl 1)
[7] Evapotranspiration prediction for European forest sites does not improve with assimilation of in situ soil water content data
Strebel, Lukas
Bogena, Heye
Vereecken, Harry
Andreasen, Mie
Aranda-Barranco, Sergio
Franssen, Harrie-Jan Hendricks
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2024, 28 (04) : 1001 - 1026
[8] Does adding ICU data to the POSSUM score improve the prediction of outcomes following surgery for upper gastrointestinal malignancies?
Butterfield, R.
Stedman, W.
Herod, R.
Aneman, A.
ANAESTHESIA AND INTENSIVE CARE, 2015, 43 (04) : 490 - 496
[9] Does the Intraoperative Physiological Data Improve Machine Learning-Based Outcome Prediction in Cardiac Surgical Patients?
Meng, Lingzhong
Han, Jiange
Guo, Zhigang
Lu, Liangfu
Ma, Songnan
Wu, Yubo
Zhai, Wenqian
ANESTHESIA AND ANALGESIA, 2023, 136 : 848 - 849
[10] Predicting Random Walks and a Data-Splitting Prediction Region
Haile, Mulubrhan G.
Zhang, Lingling
Olive, David J.
STATS, 2024, 7 (01): : 23 - 33

← 1 2 3 4 5 →