MIVAE: Multiple Imputation based on Variational Auto-Encoder

被引:10
|
作者
Ma, Qian [1 ]
Li, Xia [1 ]
Bai, Mei [1 ]
Wang, Xite [1 ]
Ning, Bo [1 ]
Li, Guanyu [1 ]
机构
[1] Dalian Maritime Univ, Sch Informat Sci & Technol, Dalian 116026, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing value; Multiple imputation; Variational Auto-Encoder; Data quality; MISSING DATA; INFERENCE;
D O I
10.1016/j.engappai.2023.106270
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, the issue of MV imputation has become one of the research hotspots in the field of data quality, since the missing values (MVs) are prevalent in real-world datasets and bring challenges to advanced data analytics algorithms. To impute the MVs, most existing approaches directly derive one estimation for each MV, which is categorized as the single imputation (SI). However, the SI ignores the uncertainty of the MVs, and thereby usually derive unsatisfactory imputation results compared to the Multiple imputation (MI). To extract the uncertainty of the MVs, the MI algorithms derive multiple candidate estimations for each MV. Nevertheless, existing MI approaches are few due to the complicated data-handling process. Accordingly, in this paper, by exploring the Variational Auto-Encoder (VAE) model, we propose a new MI approach, namely MIVAE (Multiple Imputation based on Variational Auto-Encoder) to impute MVs for the tabular data. In MIVAE, we first add a corrupted input layer (where the synthetic MVs are introduced) adjacent to the original input layer to make the model capable of MV issue. Then, we obtain multiple rather than single candidate estimations for each data sample from the posterior distribution of the latent variables learned by our designed model. In such way, the multiple imputation is effectively implemented where the uncertainty of the MVs are extracted perfectly. Next, to obtain satisfactory imputation results, we add a data analysis layer at the end of the network to integrate multiple candidate estimations intelligently. Finally, the experimental results over four real-world datasets demonstrate that MIVAE achieves significantly higher imputation accuracy compared to existing solutions, and MIVAE are capable of handling both numerical and categorized tabular data. For example, the imputation accuracy based on MIVAE improves up to about 40% and 30% compared with PMM and MIWAE (which are the state-of-the-art MI approach) over the CropMapping dataset, respectively. Moreover, we train a MIVAE model over three datasets containing MVs, respectively. By leveraging the trained MIVAE, the classification performance over the imputed data is similar to that over the complete data.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Network Intrusion Detection Based on Supervised Adversarial Variational Auto-Encoder With Regularization
    Yang, Yanqing
    Zheng, Kangfeng
    Wu, Bin
    Yang, Yixian
    Wang, Xiujuan
    IEEE ACCESS, 2020, 8 : 42169 - 42184
  • [42] Missing Data Imputation for Solar Yield Prediction using Temporal Multi-Modal Variational Auto-Encoder
    Shen, Meng
    Zhang, Huaizheng
    Cao, Yixin
    Yang, Fan
    Wen, Yonggang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2558 - 2566
  • [43] Generation of Optical Coherence Tomography Images in Ophthalmology Based on Variational Auto-Encoder
    Zhao Mengmeng
    Lu Zhenzhen
    Zhu Shuyuan
    Feng Jihong
    ACTA OPTICA SINICA, 2021, 41 (14)
  • [44] An effective variational auto-encoder-based model for traffic flow imputation
    Zhang, Shuo
    Hu, Xingbang
    Chen, Jinyi
    Zhang, Wenbo
    Huang, Hejiao
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (05): : 2617 - 2631
  • [45] Network Intrusion Detection Based on Semi-supervised Variational Auto-Encoder
    Osada, Genki
    Omote, Kazumasa
    Nishide, Takashi
    COMPUTER SECURITY - ESORICS 2017, PT II, 2017, 10493 : 344 - 361
  • [46] Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler
    Xie, Jianwen
    Zheng, Zilong
    Li, Ping
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10441 - 10451
  • [47] An effective variational auto-encoder-based model for traffic flow imputation
    Shuo Zhang
    Xingbang Hu
    Jinyi Chen
    Wenbo Zhang
    Hejiao Huang
    Neural Computing and Applications, 2024, 36 : 2617 - 2631
  • [48] Research on Variational Graph Auto-Encoder Based on Multidimensional Cloud Concept Embedding
    Dai J.
    Zhang Q.-R.
    Wang G.-Y.
    Peng Y.-H.
    Tu S.-X.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (12): : 3507 - 3519
  • [49] Unsupervised Anomaly Detection Using Variational Auto-Encoder based Feature Extraction
    Yao, Rong
    Liu, Chongdang
    Zhang, Linxuan
    Peng, Peng
    2019 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT (ICPHM), 2019,
  • [50] Conditioned Variational Auto-encoder for Detecting Osteoporotic Vertebral Fractures
    Husseini, Malek
    Sekuboyina, Anjany
    Bayat, Amirhossein
    Menze, Bjoern H.
    Loeffler, Maximilian
    Kirschke, Jan S.
    COMPUTATIONAL METHODS AND CLINICAL APPLICATIONS FOR SPINE IMAGING, CSI 2019, 2020, 11963 : 29 - 38