Data Split Strategies for Evolving Predictive Models

被引:7
|
作者
Raykar, Vikas C. [1 ]
Saha, Amrita [1 ]
机构
[1] IBM Res, Bangalore, Karnataka, India
关键词
Data splits; Model assessment; Predictive models;
D O I
10.1007/978-3-319-23528-8_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A conventional textbook prescription for building good predictive models is to split the data into three parts: training set (for model fitting), validation set (for model selection), and test set (for final model assessment). Predictive models can potentially evolve over time as developers improve their performance either by acquiring new data or improving the existing model. The main contribution of this paper is to discuss problems encountered and propose workflows to manage the allocation of newly acquired data into different sets in such dynamic model building and updating scenarios. Specifically we propose three different workflows (parallel dump, serial waterfall, and hybrid) for allocating new data into the existing training, validation, and test splits. Particular emphasis is laid on avoiding the bias due to the repeated use of the existing validation or the test set.
引用
收藏
页码:3 / 19
页数:17
相关论文
共 50 条
  • [31] Data partition methodology for validation of predictive models
    Morrison, Rebecca E.
    Bryant, Corey M.
    Terejanu, Gabriel
    Prudhomme, Serge
    Miki, Kenji
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 66 (10) : 2114 - 2125
  • [32] Predictive distributions in binary models with missing data
    Hentges, AL
    Dunsmore, IR
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1998, 27 (03) : 735 - 759
  • [33] Data integration strategies for predictive analytics in precision medicine
    Frey, Lewis J.
    PERSONALIZED MEDICINE, 2018, 15 (06) : 543 - 550
  • [34] APPLYING PREDICTIVE DATA MINING STRATEGIES TO TOXICITY ASSESSMENT
    Yang, Chihae
    DRUG METABOLISM REVIEWS, 2007, 39 : 7 - 7
  • [35] Genetic Programming for Evolving a Front of Interpretable Models for Data Visualization
    Lensen, Andrew
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (11) : 5468 - 5482
  • [36] Calibration strategies to validate predictive models: is new always better?
    Nicolás Serrano
    Intensive Care Medicine, 2012, 38 : 1246 - 1248
  • [37] Calibration strategies to validate predictive models: is new always better?
    Serrano, Nicolas
    INTENSIVE CARE MEDICINE, 2012, 38 (08) : 1246 - 1248
  • [38] A Comparison of Strategies for Incorporating Nuisance Variables into Predictive Neuroimaging Models
    Rao, Anil
    Monteiro, Joao
    Ashburner, John
    Portugal, Liana
    Fernandes, Orlando, Jr.
    De Oliveira, Leticia
    Pereira, Mirtes
    Mourao-Miranda, Janaina
    2015 INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION IN NEUROIMAGING (PRNI) 2015, 2015, : 61 - 64
  • [39] Exploring active learning strategies for predictive models in mechanics of materials
    Chen, Yingbin
    Deierling, Phillip
    Xiao, Shaoping
    APPLIED PHYSICS A-MATERIALS SCIENCE & PROCESSING, 2024, 130 (08):
  • [40] Evolving strategies in blackjack
    Fogel, DB
    CEC2004: PROCEEDINGS OF THE 2004 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2004, : 1427 - 1434