MIVAE: Multiple Imputation based on Variational Auto-Encoder

被引:10
|
作者
Ma, Qian [1 ]
Li, Xia [1 ]
Bai, Mei [1 ]
Wang, Xite [1 ]
Ning, Bo [1 ]
Li, Guanyu [1 ]
机构
[1] Dalian Maritime Univ, Sch Informat Sci & Technol, Dalian 116026, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing value; Multiple imputation; Variational Auto-Encoder; Data quality; MISSING DATA; INFERENCE;
D O I
10.1016/j.engappai.2023.106270
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, the issue of MV imputation has become one of the research hotspots in the field of data quality, since the missing values (MVs) are prevalent in real-world datasets and bring challenges to advanced data analytics algorithms. To impute the MVs, most existing approaches directly derive one estimation for each MV, which is categorized as the single imputation (SI). However, the SI ignores the uncertainty of the MVs, and thereby usually derive unsatisfactory imputation results compared to the Multiple imputation (MI). To extract the uncertainty of the MVs, the MI algorithms derive multiple candidate estimations for each MV. Nevertheless, existing MI approaches are few due to the complicated data-handling process. Accordingly, in this paper, by exploring the Variational Auto-Encoder (VAE) model, we propose a new MI approach, namely MIVAE (Multiple Imputation based on Variational Auto-Encoder) to impute MVs for the tabular data. In MIVAE, we first add a corrupted input layer (where the synthetic MVs are introduced) adjacent to the original input layer to make the model capable of MV issue. Then, we obtain multiple rather than single candidate estimations for each data sample from the posterior distribution of the latent variables learned by our designed model. In such way, the multiple imputation is effectively implemented where the uncertainty of the MVs are extracted perfectly. Next, to obtain satisfactory imputation results, we add a data analysis layer at the end of the network to integrate multiple candidate estimations intelligently. Finally, the experimental results over four real-world datasets demonstrate that MIVAE achieves significantly higher imputation accuracy compared to existing solutions, and MIVAE are capable of handling both numerical and categorized tabular data. For example, the imputation accuracy based on MIVAE improves up to about 40% and 30% compared with PMM and MIWAE (which are the state-of-the-art MI approach) over the CropMapping dataset, respectively. Moreover, we train a MIVAE model over three datasets containing MVs, respectively. By leveraging the trained MIVAE, the classification performance over the imputed data is similar to that over the complete data.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Fair Transfer Learning with Factor Variational Auto-Encoder
    Shaofan Liu
    Shiliang Sun
    Jing Zhao
    Neural Processing Letters, 2023, 55 : 2049 - 2061
  • [32] Symbolic expression generation via variational auto-encoder
    Popov, Sergei
    Lazarev, Mikhail
    Belavin, Vladislav
    Derkach, Denis
    Ustyuzhanin, Andrey
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [33] Symbolic expression generation via variational auto-encoder
    Popov S.
    Lazarev M.
    Belavin V.
    Derkach D.
    Ustyuzhanin A.
    PeerJ Computer Science, 2023, 9
  • [34] A Variational Auto-Encoder Model for Underwater Acoustic Channels
    Wei, Li
    Wang, Zhaohui
    WUWNET'21: THE 15TH ACM INTERNATIONAL CONFERENCE ON UNDERWATER NETWORKS & SYSTEMS, 2021,
  • [35] Modeling Password Guessability via Variational Auto-Encoder
    Wang, Jinwei
    Li, Yong
    Chen, Xi
    Zhou, Yongbin
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 348 - 353
  • [36] VAEEG: Variational auto-encoder for extracting EEG representation
    Zhao, Tong
    Cui, Yi
    Ji, Taoyun
    Luo, Jiejian
    Li, Wenling
    Jiang, Jun
    Gao, Zaifen
    Hu, Wenguang
    Yan, Yuxiang
    Jiang, Yuwu
    Hong, Bo
    NEUROIMAGE, 2024, 304
  • [37] Variational Bandwidth Auto-Encoder for Hybrid Recommender Systems
    Zhu, Yaochen
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 5371 - 5385
  • [38] A Variational Auto-Encoder Model for Stochastic Point Processes
    Mehrasa, Nazanin
    Jyothi, Akash Abdu
    Durand, Thibaut
    He, Jiawei
    Sigal, Leonid
    Mori, Greg
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3160 - 3169
  • [39] Adaptive importance sampling supported by a variational auto-encoder
    Wang, Hechuan
    Bugallo, Monica F.
    Djuric, Petar M.
    2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 619 - 623
  • [40] Convolutional auto-encoder based multiple description coding network
    Meng, Lili
    Li, Hongfei
    Zhang, Jia
    Tan, Yanyan
    Ren, Yuwei
    Zhang, Huaxiang
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (04): : 1689 - 1703