Data Split Strategies for Evolving Predictive Models

被引:7
|
作者
Raykar, Vikas C. [1 ]
Saha, Amrita [1 ]
机构
[1] IBM Res, Bangalore, Karnataka, India
关键词
Data splits; Model assessment; Predictive models;
D O I
10.1007/978-3-319-23528-8_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A conventional textbook prescription for building good predictive models is to split the data into three parts: training set (for model fitting), validation set (for model selection), and test set (for final model assessment). Predictive models can potentially evolve over time as developers improve their performance either by acquiring new data or improving the existing model. The main contribution of this paper is to discuss problems encountered and propose workflows to manage the allocation of newly acquired data into different sets in such dynamic model building and updating scenarios. Specifically we propose three different workflows (parallel dump, serial waterfall, and hybrid) for allocating new data into the existing training, validation, and test splits. Particular emphasis is laid on avoiding the bias due to the repeated use of the existing validation or the test set.
引用
收藏
页码:3 / 19
页数:17
相关论文
共 50 条
  • [1] Evolving Data Augmentation Strategies
    Pereira, Sofia
    Correia, Joao
    Machado, Penousal
    APPLICATIONS OF EVOLUTIONARY COMPUTATION (EVOAPPLICATIONS 2022), 2022, : 337 - 351
  • [2] Evolving predictive neural models for complex processes
    De Felice, Matteo
    Annunziato, Mauro
    Bertini, Ilaria
    Panzieri, Stefano
    Pizzuti, Stefano
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 268 - +
  • [3] Predictive models in emergency medicine and their missing data strategies: a systematic review
    Emilien Arnaud
    Mahmoud Elbattah
    Christine Ammirati
    Gilles Dequen
    Daniel Aiham Ghazali
    npj Digital Medicine, 6
  • [4] Statistical strategies and stochastic predictive models for the MARK-AGE data
    Giampieri, Enrico
    Remondini, Daniel
    Bacalini, Maria Giulia
    Garagnani, Paolo
    Pirazzini, Chiara
    Yani, Stella Lukas
    Giuliani, Cristina
    Menichetti, Giulia
    Zironi, Isabella
    Sala, Claudia
    Capri, Miriam
    Franceschi, Claudio
    Buerkle, Alexander
    Castellani, Gastone
    MECHANISMS OF AGEING AND DEVELOPMENT, 2015, 151 : 45 - 53
  • [5] Predictive models in emergency medicine and their missing data strategies: a systematic review
    Arnaud, Emilien
    Elbattah, Mahmoud
    Ammirati, Christine
    Dequen, Gilles
    Ghazali, Daniel Aiham
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [6] A dynamic split-and-merge approach for evolving cluster models
    Lughofer, Edwin
    EVOLVING SYSTEMS, 2012, 3 (03) : 135 - 151
  • [7] Offshoring Strategies: Evolving Captive Center Models
    Holder, Sara
    LIBRARY JOURNAL, 2011, 136 (07) : 100 - 101
  • [8] Discovering predictive variables when evolving cognitive models
    Lane, PCR
    Gobet, F
    PATTERN RECOGNITION AND DATA MINING, PT 1, PROCEEDINGS, 2005, 3686 : 108 - 117
  • [9] Ontology for Strategies and Predictive Maintenance models
    Cho, Sangje
    Hildebrand-Ehrhardt, Marlene
    May, Gokan
    Kiritsis, Dimitris
    IFAC PAPERSONLINE, 2020, 53 (03): : 257 - 264
  • [10] Synthetic data generation with deep generative models to enhance predictive tasks in trading strategies
    Carvajal-Patino, Daniel
    Ramos-Pollan, Raul
    RESEARCH IN INTERNATIONAL BUSINESS AND FINANCE, 2022, 62