Gaussian processes for missing value imputation

被引:10
|
作者
Jafrasteh, Bahram [1 ]
Hernandez-Lobato, Daniel [2 ]
Lubian-Lopez, Simon Pedro
Benavente-Fernandez, Isabel [1 ,3 ,4 ]
机构
[1] Puerta Mar Univ, Biomed Res & Innovat Inst, Cadiz INiB Res Unit, Cadiz, Spain
[2] Univ Autonoma Madrid, Comp Sci Dept, Madrid, Spain
[3] Puerta Mar Univ Hosp, Dept Pediat, Div Neonatol, Cadiz, Spain
[4] Univ Cddiz, Med Sch, Dept Child & Mother Hlth & Radiol, Area Pediat, Cadiz, Spain
关键词
Missing values; Gaussian process; Deep learning; Deep Gaussian processes; Variational inference;
D O I
10.1016/j.knosys.2023.110603
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A missing value indicates that a particular attribute of an instance of a learning problem is not recorded. They are very common in many real-life datasets. In spite of this, however, most machine learning methods cannot handle missing values. Thus, they should be imputed before training. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs (SGPs) can be used to get a predictive distribution for missing values. We present a hierarchical composition of sparse GPs that is used to predict the missing values at each dimension using the observed values from the other dimensions. Importantly, we consider that the input attributes to each sparse GP used for prediction may also have missing values. The missing values in those input attributes are replaced by the predictions of the previous sparse GPs in the hierarchy. We call our approach missing GP (MGP). MGP can impute all observed missing values. It outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP on one private clinical data set and on four UCI datasets with a different percentage of missing values. Furthermore, we compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. Our results show that the performance of MGP is significantly better. (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:12
相关论文
共 50 条
  • [31] On the use of adaptive nearest neighbors for missing value imputation
    Jhun, Myoungshic
    Jeong, Hyeong Chul
    Koo, Ja-Yong
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2007, 36 (06) : 1275 - 1286
  • [32] imputeTS: Time Series Missing Value Imputation in R
    Moritz, Steffen
    Bartz-Beielstein, Thomas
    R JOURNAL, 2017, 9 (01): : 207 - 218
  • [33] Optimization of Missing Value Imputation using Reinforcement Programming
    Rachmawan, Irene Erlyn Wina
    Barakbah, Ali Ridho
    2015 International Electronics Symposium (IES), 2015, : 128 - 133
  • [34] A Review On Missing Value Estimation Using Imputation Algorithm
    Armina, Roslan
    Zain, Azlan Mohd
    Ali, Nor Azizah
    Sallehuddin, Roselina
    6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL MATHEMATICS (ICCSCM 2017), 2017, 892
  • [35] A hybrid imputation approach for microarray missing value estimation
    Huihui Li
    Changbo Zhao
    Fengfeng Shao
    Guo-Zheng Li
    Xiao Wang
    BMC Genomics, 16
  • [36] A hybrid imputation approach for microarray missing value estimation
    Li, Huihui
    Zhao, Changbo
    Shao, Fengfeng
    Li, Guo-Zheng
    Wang, Xiao
    BMC GENOMICS, 2015, 16
  • [37] Iterative missing value imputation based on feature importance
    Guo, Cong
    Yang, Wei
    Liu, Chun
    Li, Zheng
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (10) : 6387 - 6414
  • [38] Autoreplicative random forests with applications to missing value imputation
    Antonenko, Ekaterina
    Carreno, Ander
    Read, Jesse
    MACHINE LEARNING, 2024, 113 (10) : 7617 - 7643
  • [39] Incorporating Nonlinear Relationships in Microarray Missing Value Imputation
    Yu, Tianwei
    Peng, Hesen
    Sun, Wei
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (03) : 723 - 731
  • [40] A robust missing value imputation method for noisy data
    Bing Zhu
    Changzheng He
    Panos Liatsis
    Applied Intelligence, 2012, 36 : 61 - 74