Deep Generative Imputation Model for Missing Not At Random Data

被引:0
|
作者
Chen, Jialei [1 ]
Xu, Yuanbo [1 ]
Wang, Pengyang [2 ]
Yang, Yongjian [1 ]
机构
[1] Jilin Univ, Dept Comp Sci & Technol, MIC Lab, Changchun, Peoples R China
[2] Univ Macau, Dept Comp & Informat Sci, SKL IOTSC, Macau, Peoples R China
关键词
Missing Data; Missing Not At Random; Imputation; Deep Generative Models; Variational Autoencoder;
D O I
10.1145/3583780.3614835
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the realistic scenario whereas more complex and challenging. Existing statistical methods model the MNAR mechanism by different decomposition of the joint distribution of the complete data and the missing mask. But we empirically find that directly incorporating these statistical methods into deep generative models is sub-optimal. Specifically, it would neglect the confidence of the reconstructed mask during the MNAR imputation process, which leads to insufficient information extraction and less-guaranteed imputation quality. In this paper, we revisit the MNAR problem from a novel perspective that the complete data and missing mask are two modalities of incomplete data on an equal footing. Along with this line, we put forward a generative-model-specific joint probability decomposition method, conjunction model, to represent the distributions of two modalities in parallel and extract sufficient information from both complete data and missing mask. Taking a step further, we exploit a deep generative imputation model, namely GNR, to process the real-world missing mechanism in the latent space and concurrently impute the incomplete data and reconstruct the missing mask. The experimental results show that our GNR surpasses state-of-the-art MNAR baselines with significant margins (averagely improved from 9.9% to 18.8% in RMSE) and always gives a better mask reconstruction accuracy which makes the imputation more principle.
引用
收藏
页码:316 / 325
页数:10
相关论文
共 50 条
  • [41] Deep learning for missing value imputation of continuous data and the effect of data discretization
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Zhong, Jia Rong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 239
  • [42] Missing data imputation: focusing on single imputation
    Zhang, Zhongheng
    [J]. ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (01)
  • [43] Spatiotemporal Generative Adversarial Imputation Networks: An Approach to Address Missing Data for Wind Turbines
    Hu, Xuguang
    Zhan, Zhaokang
    Ma, Dazhong
    Zhang, Siqi
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [44] Federated conditional generative adversarial nets imputation method for air quality missing data
    Zhou, Xu
    Liu, Xiaofeng
    Lan, Gongjin
    Wu, Jian
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [45] Multiple imputation for non-monotone missing not at random data using the no self-censoring model
    Ren, Boyu
    Lipsitz, Stuart R.
    Weiss, Roger D.
    Fitzmaurice, Garrett M.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (10) : 1973 - 1993
  • [46] Guided multiple imputation of missing data - Using a subsample to strengthen the missing-at-random assumption
    Fraser, Gary
    Ru Yan
    [J]. EPIDEMIOLOGY, 2007, 18 (02) : 246 - 252
  • [47] VIGAN: Missing View Imputation with Generative Adversarial Networks
    Shang, Chao
    Palmer, Aaron
    Sun, Jiangwen
    Chen, Ko-Shin
    Lu, Jin
    Bi, Jinbo
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 766 - 775
  • [48] MISSING DATA IMPUTATION FOR HEALTH CARE BIG DATA USING DENOISING AUTOENCODER WITH GENERATIVE ADVERSARIAL NETWORK
    Zhang, Yinbing
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (05): : 3850 - 3857
  • [49] Missing Data: data replacement and imputation
    Hutcheson, Graeme
    Pampaka, Maria
    [J]. JOURNAL OF MODELLING IN MANAGEMENT, 2012, 7 (02)
  • [50] LFM-D2GAIN: An Improved Missing Data Imputation Method Based on Generative Adversarial Imputation Nets
    Shen, Yebai
    Zhang, Chao
    Zhang, Songyu
    Yan, Jinghua
    Bu, Fanliang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 447 - 453