Leveraging Variational Autoencoders for Multiple Data Imputation

被引:3
|
作者
Roskams-Hieter, Breeshey [1 ,2 ]
Wells, Jude [2 ,3 ]
Wade, Sara [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[2] Hlth Data Res UK, London, England
[3] UCL, London, England
基金
英国惠康基金;
关键词
VAEs; multiple imputation; MISSING-DATA;
D O I
10.1007/978-3-031-43412-9_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to learn complex and non-linear relationships. In this work, we investigate the ability of variational autoencoders (VAEs) to account for uncertainty in missing data through multiple imputation. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations. To overcome this, we employ beta-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of beta is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. We assess three alternative methods for sampling from the posterior distribution of missing values and apply the approach to transcriptomics datasets with various simulated missingness scenarios. Finally, we show that single imputation in transcriptomic data can cause false discoveries in downstream tasks and employing multiple imputation with beta-VAEs can effectively mitigate these inaccuracies.
引用
收藏
页码:491 / 506
页数:16
相关论文
共 50 条
  • [31] Networked Time Series Imputation via Position-aware Graph Enhanced Variational Autoencoders
    Wang, Dingsu
    Yan, Yuchen
    Qiu, Ruizhong
    Zhu, Yada
    Guan, Kaiyu
    Margenot, Andrew
    Tong, Hanghang
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2256 - 2268
  • [32] Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes
    Pereira, Ricardo Cardoso
    Santos, Miriam Seoane
    Rodrigues, Pedro Pereira
    Abreu, Pedro Henriques
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2020, 69 : 1255 - 1285
  • [33] Data Imputation in Electricity Consumption Profiles through Shape Modeling with Autoencoders
    Duarte, Oscar
    Duarte, Javier E.
    Rosero-Garcia, Javier
    MATHEMATICS, 2024, 12 (19)
  • [34] Optimizing Satellite Image Analysis: Leveraging Variational Autoencoders Latent Representations for Direct Integration
    Giuliano, Alessandro
    Gadsden, S. Andrew
    Yawney, John
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [35] Reviewing autoencoders for missing data imputation: Technical trends, applications and outcomes
    Pereira, Ricardo Cardoso
    Santos, Miriam Seoane
    Rodrigues, Pedro Pereira
    Abreu, Pedro Henriques
    Journal of Artificial Intelligence Research, 2020, 69 : 1255 - 1285
  • [36] Multiple imputation and synthetic data
    Kim, Joungyoun
    Park, Min-Jeong
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (01) : 83 - 97
  • [37] Missing Data and Multiple Imputation
    Cummings, Peter
    JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [38] Multiple imputation for missing data
    Patrician, PA
    RESEARCH IN NURSING & HEALTH, 2002, 25 (01) : 76 - 84
  • [39] Multiple imputation of missing data
    Lydersen, Stian
    TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2022, 142 (02) : 151 - 151
  • [40] Cafe: Improved Federated Data Imputation by Leveraging Missing Data Heterogeneity
    Min, Sitao
    Asif, Hafiz
    Wang, Xinyue
    Vaidya, Jaideep
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2266 - 2281