Does data splitting improve prediction?

被引:17
|
作者
Faraway, Julian J. [1 ]
机构
[1] Univ Bath, Dept Math Sci, Bath BA2 7AY, Avon, England
关键词
Cross-validation; Model assessment; Model uncertainty; Model validation; Prediction; Scoring; MODEL SELECTION; VALIDATION; ERROR;
D O I
10.1007/s11222-014-9522-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the problem of constructing reliable predictive distributions for future observed values. We judge the predictive performance using log scoring. We compare the full data strategy with the data splitting strategy for prediction. We show how the full data score can be decomposed into model selection, parameter estimation and data reuse costs. Data splitting is preferred when data reuse costs are high. We investigate the relative performance of the strategies in four simulation scenarios. We introduce a hybrid estimator that uses one part for model selection but both parts for estimation. We argue that a split data analysis is prefered to a full data analysis for prediction with some exceptions.
引用
收藏
页码:49 / 60
页数:12
相关论文
共 50 条
  • [1] Does data splitting improve prediction?
    Julian J. Faraway
    Statistics and Computing, 2016, 26 : 49 - 60
  • [2] IMPROVE PREDICTION WITH DATA RECONCILIATION
    LEIBOVICI, CF
    VERNEUIL, VS
    YANG, P
    HYDROCARBON PROCESSING, 1993, 72 (10): : 79 - 80
  • [3] Does integration of '-omics' data with traditional endpoints improve our understanding and prediction of toxicity?
    Hamadeh, HK
    TOXICOLOGY, 2004, 202 (1-2) : 37 - 37
  • [4] Advanced imaging: Does it improve outcome prediction?
    Huppi, P.
    EUROPEAN JOURNAL OF PEDIATRICS, 2016, 175 (11) : 1401 - 1402
  • [5] Validation in prediction research: the waste by data splitting COMMENTARY
    Steyerberg, Ewout W.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2018, 103 : 131 - 133
  • [6] Does the inclusion of rare variants improve risk prediction?
    Erin Austin
    Wei Pan
    Xiaotong Shen
    BMC Proceedings, 8 (Suppl 1)
  • [7] Evapotranspiration prediction for European forest sites does not improve with assimilation of in situ soil water content data
    Strebel, Lukas
    Bogena, Heye
    Vereecken, Harry
    Andreasen, Mie
    Aranda-Barranco, Sergio
    Franssen, Harrie-Jan Hendricks
    HYDROLOGY AND EARTH SYSTEM SCIENCES, 2024, 28 (04) : 1001 - 1026
  • [8] Does adding ICU data to the POSSUM score improve the prediction of outcomes following surgery for upper gastrointestinal malignancies?
    Butterfield, R.
    Stedman, W.
    Herod, R.
    Aneman, A.
    ANAESTHESIA AND INTENSIVE CARE, 2015, 43 (04) : 490 - 496
  • [9] Does the Intraoperative Physiological Data Improve Machine Learning-Based Outcome Prediction in Cardiac Surgical Patients?
    Meng, Lingzhong
    Han, Jiange
    Guo, Zhigang
    Lu, Liangfu
    Ma, Songnan
    Wu, Yubo
    Zhai, Wenqian
    ANESTHESIA AND ANALGESIA, 2023, 136 : 848 - 849
  • [10] Predicting Random Walks and a Data-Splitting Prediction Region
    Haile, Mulubrhan G.
    Zhang, Lingling
    Olive, David J.
    STATS, 2024, 7 (01): : 23 - 33