Autoreplicative random forests with applications to missing value imputation

被引:0
|
作者
Antonenko, Ekaterina [1 ,2 ,3 ,4 ]
Carreno, Ander [5 ]
Read, Jesse [1 ]
机构
[1] Ecole Polytech, LIX, IP Paris, F-91120 Palaiseau, France
[2] PSL Res Univ, CBIO Ctr Computat Biol, Mines Paris, F-75006 Paris, France
[3] PSL Res Univ, Inst Curie, F-75005 Paris, France
[4] INSERM, U900, F-75005 Paris, France
[5] Quant AI Lab, Madrid 28043, Spain
关键词
Multi-label classification; Multi-output modeling; Missing value imputation; Probabilistic inference;
D O I
10.1007/s10994-024-06584-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing values are a common problem in data science and machine learning. Removing instances with missing values is a straightforward workaround, but this can significantly hinder subsequent data analysis, particularly when features outnumber instances. There are a variety of methodologies proposed in the literature for imputing missing values. Denoising Autoencoders, for example, have been leveraged efficiently for imputation. However, neural network approaches have been relatively less effective on smaller datasets. In this work, we propose Autoreplicative Random Forests (ARF) as a multi-output learning approach, which we introduce in the context of a framework that may impute via either an iterative or procedural process. Experiments on several low- and high-dimensional datasets show that ARF is computationally efficient and exhibits better imputation performance than its competitors, including neural network approaches. In order to provide statistical analysis and mathematical background to the proposed missing value imputation framework, we also propose probabilistic ARFs, where the confidence values are provided over different imputation hypotheses, therefore maximizing the utility of such a framework in a machine-learning pipeline targeting predictive performance.
引用
收藏
页码:7617 / 7643
页数:27
相关论文
共 50 条
  • [41] On the use of adaptive nearest neighbors for missing value imputation
    Jhun, Myoungshic
    Jeong, Hyeong Chul
    Koo, Ja-Yong
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2007, 36 (06) : 1275 - 1286
  • [42] imputeTS: Time Series Missing Value Imputation in R
    Moritz, Steffen
    Bartz-Beielstein, Thomas
    R JOURNAL, 2017, 9 (01): : 207 - 218
  • [43] Optimization of Missing Value Imputation using Reinforcement Programming
    Rachmawan, Irene Erlyn Wina
    Barakbah, Ali Ridho
    2015 International Electronics Symposium (IES), 2015, : 128 - 133
  • [44] A Review On Missing Value Estimation Using Imputation Algorithm
    Armina, Roslan
    Zain, Azlan Mohd
    Ali, Nor Azizah
    Sallehuddin, Roselina
    6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL MATHEMATICS (ICCSCM 2017), 2017, 892
  • [45] A hybrid imputation approach for microarray missing value estimation
    Huihui Li
    Changbo Zhao
    Fengfeng Shao
    Guo-Zheng Li
    Xiao Wang
    BMC Genomics, 16
  • [46] A hybrid imputation approach for microarray missing value estimation
    Li, Huihui
    Zhao, Changbo
    Shao, Fengfeng
    Li, Guo-Zheng
    Wang, Xiao
    BMC GENOMICS, 2015, 16
  • [47] Iterative missing value imputation based on feature importance
    Guo, Cong
    Yang, Wei
    Liu, Chun
    Li, Zheng
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (10) : 6387 - 6414
  • [48] Incorporating Nonlinear Relationships in Microarray Missing Value Imputation
    Yu, Tianwei
    Peng, Hesen
    Sun, Wei
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (03) : 723 - 731
  • [49] A robust missing value imputation method for noisy data
    Bing Zhu
    Changzheng He
    Panos Liatsis
    Applied Intelligence, 2012, 36 : 61 - 74
  • [50] Neighborhood-aware autoencoder for missing value imputation
    Aidos, Helena
    Tomas, Pedro
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1542 - 1546