Leveraging Variational Autoencoders for Multiple Data Imputation

被引:3
|
作者
Roskams-Hieter, Breeshey [1 ,2 ]
Wells, Jude [2 ,3 ]
Wade, Sara [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[2] Hlth Data Res UK, London, England
[3] UCL, London, England
基金
英国惠康基金;
关键词
VAEs; multiple imputation; MISSING-DATA;
D O I
10.1007/978-3-031-43412-9_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to learn complex and non-linear relationships. In this work, we investigate the ability of variational autoencoders (VAEs) to account for uncertainty in missing data through multiple imputation. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations. To overcome this, we employ beta-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of beta is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. We assess three alternative methods for sampling from the posterior distribution of missing values and apply the approach to transcriptomics datasets with various simulated missingness scenarios. Finally, we show that single imputation in transcriptomic data can cause false discoveries in downstream tasks and employing multiple imputation with beta-VAEs can effectively mitigate these inaccuracies.
引用
收藏
页码:491 / 506
页数:16
相关论文
共 50 条
  • [1] Unsupervised data imputation with multiple importance sampling variational autoencoders
    Kuang, Shenfen
    Huang, Yewen
    Song, Jie
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [2] Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data
    Pereira, Ricardo Cardoso
    Abreu, Pedro Henriques
    Rodrigues, Pedro Pereira
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (08) : 4218 - 4227
  • [3] Variational Autoencoders for Missing Data Imputation with Application to a Simulated Milling Circuit
    McCoy, John T.
    Kroon, Steve
    Auret, Lidia
    IFAC PAPERSONLINE, 2018, 51 (21): : 141 - 146
  • [4] Joint variational autoencoders for multimodal imputation and embedding
    Noah Cohen Kalafut
    Xiang Huang
    Daifeng Wang
    Nature Machine Intelligence, 2023, 5 : 631 - 642
  • [5] Variational Clustering: Leveraging Variational Autoencoders for Image Clustering
    Prasad, Vignesh
    Das, Dipanjan
    Bhowmick, Brojeshwar
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Joint variational autoencoders for multimodal imputation and embedding
    Kalafut, Noah Cohen
    Huang, Xiang
    Wang, Daifeng
    NATURE MACHINE INTELLIGENCE, 2023, 5 (06) : 631 - +
  • [7] Leveraging Variational Autoencoders for Parameterized MMSE Estimation
    Baur, Michael
    Fesl, Benedikt
    Utschick, Wolfgang
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 3731 - 3744
  • [8] Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders
    Miok, Kristian
    Dong Nguyen-Doan
    Robnik-Sikonja, Marko
    Zaharie, Daniela
    2019 E-HEALTH AND BIOENGINEERING CONFERENCE (EHB), 2019,
  • [9] MIDIA: exploring denoising autoencoders for missing data imputation
    Qian Ma
    Wang-Chien Lee
    Tao-Yang Fu
    Yu Gu
    Ge Yu
    Data Mining and Knowledge Discovery, 2020, 34 : 1859 - 1897
  • [10] MIDIA: exploring denoising autoencoders for missing data imputation
    Ma, Qian
    Lee, Wang-Chien
    Fu, Tao-Yang
    Gu, Yu
    Yu, Ge
    DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (06) : 1859 - 1897