Autoreplicative random forests with applications to missing value imputation

被引:0
|
作者
Antonenko, Ekaterina [1 ,2 ,3 ,4 ]
Carreno, Ander [5 ]
Read, Jesse [1 ]
机构
[1] Ecole Polytech, LIX, IP Paris, F-91120 Palaiseau, France
[2] PSL Res Univ, CBIO Ctr Computat Biol, Mines Paris, F-75006 Paris, France
[3] PSL Res Univ, Inst Curie, F-75005 Paris, France
[4] INSERM, U900, F-75005 Paris, France
[5] Quant AI Lab, Madrid 28043, Spain
关键词
Multi-label classification; Multi-output modeling; Missing value imputation; Probabilistic inference;
D O I
10.1007/s10994-024-06584-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing values are a common problem in data science and machine learning. Removing instances with missing values is a straightforward workaround, but this can significantly hinder subsequent data analysis, particularly when features outnumber instances. There are a variety of methodologies proposed in the literature for imputing missing values. Denoising Autoencoders, for example, have been leveraged efficiently for imputation. However, neural network approaches have been relatively less effective on smaller datasets. In this work, we propose Autoreplicative Random Forests (ARF) as a multi-output learning approach, which we introduce in the context of a framework that may impute via either an iterative or procedural process. Experiments on several low- and high-dimensional datasets show that ARF is computationally efficient and exhibits better imputation performance than its competitors, including neural network approaches. In order to provide statistical analysis and mathematical background to the proposed missing value imputation framework, we also propose probabilistic ARFs, where the confidence values are provided over different imputation hypotheses, therefore maximizing the utility of such a framework in a machine-learning pipeline targeting predictive performance.
引用
收藏
页码:7617 / 7643
页数:27
相关论文
共 50 条
  • [1] Missing value imputation on missing completely at random data using multilayer perceptrons
    Silva-Ramirez, Esther-Lydia
    Pino-Mejias, Rafael
    Lopez-Coello, Manuel
    Cubiles-de-la-Vega, Maria-Dolores
    NEURAL NETWORKS, 2011, 24 (01) : 121 - 129
  • [2] Sequential Imputation of Missing Spatio-Temporal Precipitation Data Using Random Forests
    Mital, Utkarsh
    Dwivedi, Dipankar
    Brown, James B.
    Faybishenko, Boris
    Painter, Scott L.
    Steefel, Carl I.
    FRONTIERS IN WATER, 2020, 2
  • [3] Missing data imputation, matching and other applications of random recursive partitioning
    Iacus, Stefano A.
    Porro, Giuseppe
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (02) : 773 - 789
  • [4] Investigations into Missing Values Imputation Using Random Forests for Semi-supervised Data
    Ishioka, Tsunenori
    16TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS 2014), 2014, : 296 - 301
  • [5] A hybrid method for missing value imputation
    Karanikola, Aikaterini
    Kotsiantis, Sotiris
    PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019), 2019, : 74 - 79
  • [6] Gaussian processes for missing value imputation
    Jafrasteh, Bahram
    Hernandez-Lobato, Daniel
    Lubian-Lopez, Simon Pedro
    Benavente-Fernandez, Isabel
    KNOWLEDGE-BASED SYSTEMS, 2023, 273
  • [7] Missing value imputation for epistatic MAPs
    Colm Ryan
    Derek Greene
    Gerard Cagney
    Pádraig Cunningham
    BMC Bioinformatics, 11
  • [8] Missing value imputation for epistatic MAPs
    Ryan, Colm
    Greene, Derek
    Cagney, Gerard
    Cunningham, Padraig
    BMC BIOINFORMATICS, 2010, 11
  • [9] DataWig: Missing value imputation for tables
    Bießmann, Felix
    Rukat, Tammo
    Schmidt, Phillipp
    Naidu, Prathik
    Schelter, Sebastian
    Taptunov, Andrey
    Lange, Dustin
    Salinas, David
    Journal of Machine Learning Research, 2019, 20
  • [10] Missing Value Imputation for Diabetes Prediction
    Luo, Fei
    Qian, Hangwei
    Wang, Di
    Guo, Xu
    Sun, Yan
    Lee, Eng Sing
    Teong, Hui Hwang
    Lai, Ray Tian Rui
    Miao, Chunyan
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,