Identifiable Generative Models for Missing Not at Random Data Imputation

被引:0
|
作者
Ma, Chao [1 ,2 ]
Zhang, Cheng [2 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Microsoft Res Cambridge, Cambridge, England
关键词
MULTIPLE IMPUTATION; MAXIMUM-LIKELIHOOD;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world datasets often have missing values associated with complex generative processes, where the cause of the missingness may not be fully observed. This is known as missing not at random (MNAR) data. However, many imputation methods do not take into account the missingness mechanism, resulting in biased imputation values when MNAR data is present. Although there are a few methods that have considered the MNAR scenario, their model's identifiability under MNAR is generally not guaranteed. That is, model parameters can not be uniquely determined even with infinite data samples, hence the imputation results given by such models can still be biased. This issue is especially overlooked by many modern deep generative models. In this work, we fill in this gap by systematically analyzing the identifiability of generative models under MNAR. Furthermore, we propose a practical deep generative model which can provide identifiability guarantees under mild assumptions, for a wide range of MNAR mechanisms. Our method demonstrates a clear advantage for tasks on both synthetic data and multiple real-world scenarios with MNAR data.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Deep Generative Imputation Model for Missing Not At Random Data
    Chen, Jialei
    Xu, Yuanbo
    Wang, Pengyang
    Yang, Yongjian
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 316 - 325
  • [2] Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
    Ou, Hongsen
    Yao, Yunan
    He, Yi
    [J]. SENSORS, 2024, 24 (04)
  • [3] A systematic review of generative adversarial imputation network in missing data imputation
    Yuqing Zhang
    Runtong Zhang
    Butian Zhao
    [J]. Neural Computing and Applications, 2023, 35 : 19685 - 19705
  • [4] A systematic review of generative adversarial imputation network in missing data imputation
    Zhang, Yuqing
    Zhang, Runtong
    Zhao, Butian
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (27): : 19685 - 19705
  • [5] Improved generative adversarial imputation networks for missing data
    Qin, Xiwen
    Shi, Hongyu
    Dong, Xiaogang
    Zhang, Siqi
    Yuan, Liping
    [J]. APPLIED INTELLIGENCE, 2024, 54 (21) : 11068 - 11082
  • [6] Multiple imputation of ordinal missing not at random data
    Hammon, Angelina
    [J]. ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2023, 107 (04) : 671 - 692
  • [7] Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified
    Curnow, Elinor
    Capenter, James R.
    Heron, Jon E.
    Cornish, Rosie P.
    Rach, Stefan
    Didelez, Vanessa
    Langeheine, Malte
    Tilling, Kate
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2023, 160 : 100 - 109
  • [8] Multiple imputation of ordinal missing not at random data
    Angelina Hammon
    [J]. AStA Advances in Statistical Analysis, 2023, 107 : 671 - 692
  • [9] GAGIN: generative adversarial guider imputation network for missing data
    Wei Wang
    Yimeng Chai
    Yue Li
    [J]. Neural Computing and Applications, 2022, 34 : 7597 - 7610
  • [10] GAIN: Missing Data Imputation using Generative Adversarial Nets
    Yoon, Jinsung
    Jordon, James
    van der Schaar, Mihaela
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80