Gaussian processes for missing value imputation

被引:10
|
作者
Jafrasteh, Bahram [1 ]
Hernandez-Lobato, Daniel [2 ]
Lubian-Lopez, Simon Pedro
Benavente-Fernandez, Isabel [1 ,3 ,4 ]
机构
[1] Puerta Mar Univ, Biomed Res & Innovat Inst, Cadiz INiB Res Unit, Cadiz, Spain
[2] Univ Autonoma Madrid, Comp Sci Dept, Madrid, Spain
[3] Puerta Mar Univ Hosp, Dept Pediat, Div Neonatol, Cadiz, Spain
[4] Univ Cddiz, Med Sch, Dept Child & Mother Hlth & Radiol, Area Pediat, Cadiz, Spain
关键词
Missing values; Gaussian process; Deep learning; Deep Gaussian processes; Variational inference;
D O I
10.1016/j.knosys.2023.110603
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A missing value indicates that a particular attribute of an instance of a learning problem is not recorded. They are very common in many real-life datasets. In spite of this, however, most machine learning methods cannot handle missing values. Thus, they should be imputed before training. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs (SGPs) can be used to get a predictive distribution for missing values. We present a hierarchical composition of sparse GPs that is used to predict the missing values at each dimension using the observed values from the other dimensions. Importantly, we consider that the input attributes to each sparse GP used for prediction may also have missing values. The missing values in those input attributes are replaced by the predictions of the previous sparse GPs in the hierarchy. We call our approach missing GP (MGP). MGP can impute all observed missing values. It outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP on one private clinical data set and on four UCI datasets with a different percentage of missing values. Furthermore, we compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. Our results show that the performance of MGP is significantly better. (c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Missing Value Imputation for Mixed Data via Gaussian Copula
    Zhao, Yuxuan
    Udell, Madeleine
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 636 - 646
  • [2] Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things
    Yan, Xiaobo
    Xiong, Weiqing
    Hu, Liang
    Wang, Feng
    Zhao, Kuo
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [3] Online Missing Value Imputation and Change Point Detection with the Gaussian Copula
    Zhao, Yuxuan
    Landgrebe, Eric
    Shekhtman, Eliot
    Udell, Madeleine
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9199 - 9207
  • [4] Missing value imputation method based on correlation analysis and Gaussian mixture model
    Zhang, Jie
    Chang, Yuqing
    Wang, Ran
    Wang, Fuli
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2024,
  • [5] A hybrid method for missing value imputation
    Karanikola, Aikaterini
    Kotsiantis, Sotiris
    PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019), 2019, : 74 - 79
  • [6] Missing value imputation for epistatic MAPs
    Colm Ryan
    Derek Greene
    Gerard Cagney
    Pádraig Cunningham
    BMC Bioinformatics, 11
  • [7] Missing value imputation for epistatic MAPs
    Ryan, Colm
    Greene, Derek
    Cagney, Gerard
    Cunningham, Padraig
    BMC BIOINFORMATICS, 2010, 11
  • [8] DataWig: Missing value imputation for tables
    Bießmann, Felix
    Rukat, Tammo
    Schmidt, Phillipp
    Naidu, Prathik
    Schelter, Sebastian
    Taptunov, Andrey
    Lange, Dustin
    Salinas, David
    Journal of Machine Learning Research, 2019, 20
  • [9] Missing Value Imputation for Diabetes Prediction
    Luo, Fei
    Qian, Hangwei
    Wang, Di
    Guo, Xu
    Sun, Yan
    Lee, Eng Sing
    Teong, Hui Hwang
    Lai, Ray Tian Rui
    Miao, Chunyan
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] DataWig: Missing Value Imputation for Tables
    Biessmann, Felix
    Rukat, Tammo
    Schmidt, Phillipp
    Naidu, Prathik
    Schelter, Sebastian
    Taptunov, Andrey
    Lange, Dustin
    Salinas, David
    JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20