Identifiable Generative Models for Missing Not at Random Data Imputation

被引:0
|
作者
Ma, Chao [1 ,2 ]
Zhang, Cheng [2 ]
机构
[1] Univ Cambridge, Cambridge, England
[2] Microsoft Res Cambridge, Cambridge, England
关键词
MULTIPLE IMPUTATION; MAXIMUM-LIKELIHOOD;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world datasets often have missing values associated with complex generative processes, where the cause of the missingness may not be fully observed. This is known as missing not at random (MNAR) data. However, many imputation methods do not take into account the missingness mechanism, resulting in biased imputation values when MNAR data is present. Although there are a few methods that have considered the MNAR scenario, their model's identifiability under MNAR is generally not guaranteed. That is, model parameters can not be uniquely determined even with infinite data samples, hence the imputation results given by such models can still be biased. This issue is especially overlooked by many modern deep generative models. In this work, we fill in this gap by systematically analyzing the identifiability of generative models under MNAR. Furthermore, we propose a practical deep generative model which can provide identifiability guarantees under mild assumptions, for a wide range of MNAR mechanisms. Our method demonstrates a clear advantage for tasks on both synthetic data and multiple real-world scenarios with MNAR data.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Differentiable and Scalable Generative Adversarial Models for Data Imputation
    Wu, Yangyang
    Wang, Jun
    Miao, Xiaoye
    Wang, Wenjia
    Yin, Jianwei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (02) : 490 - 503
  • [22] Missing value imputation on missing completely at random data using multilayer perceptrons
    Silva-Ramirez, Esther-Lydia
    Pino-Mejias, Rafael
    Lopez-Coello, Manuel
    Cubiles-de-la-Vega, Maria-Dolores
    [J]. NEURAL NETWORKS, 2011, 24 (01) : 121 - 129
  • [23] Missing Data Imputation Through the Use of the Random Forest Algorithm
    Pantanowitz, Adam
    Marwala, Tshilidzi
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, 2009, 61 : 53 - 62
  • [24] Imputation and low-rank estimation with Missing Not At Random data
    Aude Sportisse
    Claire Boyer
    Julie Josse
    [J]. Statistics and Computing, 2020, 30 : 1629 - 1643
  • [25] Imputation of data Missing Not at Random: Artificial generation and benchmark analysis
    Pereira, Ricardo Cardoso
    Abreu, Pedro Henriques
    Rodrigues, Pedro Pereira
    Figueiredo, Mario A. T.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [26] Imputation is beneficial for handling missing data in predictive models
    Steyerberg, Ewout W.
    van Veen, Mirjam
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2007, 60 (09) : 979 - 979
  • [27] Imputation and low-rank estimation with Missing Not At Random data
    Sportisse, Aude
    Boyer, Claire
    Josse, Julie
    [J]. STATISTICS AND COMPUTING, 2020, 30 (06) : 1629 - 1643
  • [28] Auxiliary Variables in Multiple Imputation When Data Are Missing Not at Random
    Mustillo, Sarah
    Kwon, Soyoung
    [J]. JOURNAL OF MATHEMATICAL SOCIOLOGY, 2015, 39 (02): : 73 - 91
  • [29] Cautious Classification with Data Missing Not at Random Using Generative Random Forests
    Llerena, Julissa Villanueva
    Maua, Denis Deratani
    Antonucci, Alessandro
    [J]. SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2021, 2021, 12897 : 284 - 298
  • [30] Improved generative adversarial network with deep metric learning for missing data imputation
    Al-taezi, Mohammed Ali
    Wang, Yu
    Zhu, Pengfei
    Hu, Qinghua
    Al-badwi, Abdulrahman
    [J]. NEUROCOMPUTING, 2024, 570