Does data splitting improve prediction?

被引:17
|
作者
Faraway, Julian J. [1 ]
机构
[1] Univ Bath, Dept Math Sci, Bath BA2 7AY, Avon, England
关键词
Cross-validation; Model assessment; Model uncertainty; Model validation; Prediction; Scoring; MODEL SELECTION; VALIDATION; ERROR;
D O I
10.1007/s11222-014-9522-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the problem of constructing reliable predictive distributions for future observed values. We judge the predictive performance using log scoring. We compare the full data strategy with the data splitting strategy for prediction. We show how the full data score can be decomposed into model selection, parameter estimation and data reuse costs. Data splitting is preferred when data reuse costs are high. We investigate the relative performance of the strategies in four simulation scenarios. We introduce a hybrid estimator that uses one part for model selection but both parts for estimation. We argue that a split data analysis is prefered to a full data analysis for prediction with some exceptions.
引用
收藏
页码:49 / 60
页数:12
相关论文
共 50 条
  • [41] Utilizing mode of action data to improve prediction of aquatic toxicity
    Lanevskij, Kiril
    Juska, Liutauras
    Didziapetris, Remigijus
    Japertas, Pranas
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2013, 246
  • [42] Functional connectivity and volumetric data improve prediction of DBS outcomes
    Younce, J.
    Campbell, M.
    Perlmutter, J.
    Norris, S.
    MOVEMENT DISORDERS, 2020, 35 : S292 - S292
  • [43] Using methylation data to improve transcription factor binding prediction
    Morgan, Daniel
    DeMeo, Dawn L.
    Glass, Kimberly
    EPIGENETICS, 2024, 19 (01)
  • [44] Combining phenotypic and genomic data to improve prediction of binary traits
    Jarquin, D.
    Roy, A.
    Clarke, B.
    Ghosal, S.
    JOURNAL OF APPLIED STATISTICS, 2024, 51 (08) : 1497 - 1523
  • [45] Using Ocean Forecast Data To Improve Sonar Range Prediction
    Woodham, Robert
    Exelby, Jarrad
    SEA TECHNOLOGY, 2010, 51 (11) : 29 - 32
  • [46] An Optimization Precise Model of Stroke Data to Improve Stroke Prediction
    Ivanov, Ivan G.
    Kumchev, Yordan
    Hooper, Vincent James
    ALGORITHMS, 2023, 16 (09)
  • [47] Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?
    Fazelpour, Alireza
    Khoshgoftaar, Taghi M.
    Dittman, David J.
    Napolitano, Amri
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 527 - 534
  • [48] The Impact of Data Splitting Strategy on Drilling Rate Prediction in the Rumaila Oil Field
    Salih, Ameen Kareem
    Faraj, Ali Khaleel
    Ahmed, Mohammed A.
    Al-Hasnawi, Ali Nahi Abed
    PETROLEUM CHEMISTRY, 2024, 64 (07) : 781 - 786
  • [49] Does heart rate variability improve prediction of failed extubation in preterm infants?
    Fonseca Silva, Marciali Goncalves
    Gregorio, Michele Lima
    de Godoy, Moacir Fernandes
    JOURNAL OF PERINATAL MEDICINE, 2019, 47 (02) : 252 - 257
  • [50] Does rotation thrombelastometry (ROTEM) improve early prediction of coagulopathy in breast tumor?
    Bagatin, Dinko
    Sakic, Katarina
    Bagatin, Tomica
    Sturm, Deana
    Milosevic, Milan
    PERIODICUM BIOLOGORUM, 2015, 117 (02) : 291 - 296