Imputation and low-rank estimation with Missing Not At Random data

被引:25
|
作者
Sportisse, Aude [1 ,2 ]
Boyer, Claire [1 ,3 ]
Josse, Julie [2 ,4 ]
机构
[1] Sorbonne Univ, Lab Probabilites Stat & Modelisat, Paris, France
[2] Ecole Polytech, Ctr Math Appl, Palaiseau, France
[3] Ecole Normale Super, Dept Math & Applicat, Paris, France
[4] INRIA Saclay, XPOP, Palaiseau, France
关键词
Informative missing values; Denoising; Matrix completion; Accelerated proximal gradient method; EM algorithm; Nuclear norm penalty; THRESHOLDING ALGORITHM; MAXIMUM-LIKELIHOOD; MATRIX COMPLETION; SHRINKAGE; MODELS;
D O I
10.1007/s11222-020-09963-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data. Our first contribution is to suggest a model-based estimation strategy by modelling the missing mechanism distribution. An EM algorithm is then implemented, involving a Fast Iterative Soft-Thresholding Algorithm (FISTA). Our second contribution is to suggest a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and the missing mechanism: the data matrix is concatenated with the mask coding for the missing values; a low-rank structure for exponential family is assumed on this new matrix, in order to encode links between variables and missing mechanisms. The methodology that has the great advantage of handling different missing value mechanisms is robust to model specification errors. The performances of our methods are assessed on the real data collected from a trauma registry (TraumaBase (R)) containing clinical information about over twenty thousand severely traumatized patients in France. The aim is then to predict if the doctors should administrate tranexomic acid to patients with traumatic brain injury, that would limit excessive bleeding.
引用
收藏
页码:1629 / 1643
页数:15
相关论文
共 50 条
  • [41] Low-Rank Tensor Completion With 3-D Spatiotemporal Transform for Traffic Data Imputation
    Shu, Hao
    Wang, Hailin
    Peng, Jiangjun
    Meng, Deyu
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 18673 - 18687
  • [42] LSPTD: Low-rank and spatiotemporal priors enhanced Tucker decomposition for internet traffic data imputation
    Gong, Wenwu
    Huang, Zhejun
    Yang, Lili
    [J]. 2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 460 - 465
  • [43] Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation
    Chen, Xiaobo
    Wei, Zhongjie
    Li, Zuoyong
    Liang, Jun
    Cai, Yingfeng
    Zhang, Bob
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 249 - 262
  • [44] MATRIX COMPLETION UNDER LOW-RANK MISSING MECHANISM
    Mao, Xiaojun
    Wong, Raymond K. W.
    Chen, Song Xi
    [J]. STATISTICA SINICA, 2021, 31 (04) : 2005 - 2030
  • [45] Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random
    Pereira, Ricardo Cardoso
    Abreu, Pedro Henriques
    Rodrigues, Pedro Pereira
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2024, 78
  • [46] Identifiable Generative Models for Missing Not at Random Data Imputation
    Ma, Chao
    Zhang, Cheng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [47] Deep Generative Imputation Model for Missing Not At Random Data
    Chen, Jialei
    Xu, Yuanbo
    Wang, Pengyang
    Yang, Yongjian
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 316 - 325
  • [48] Multiple imputation of binary multilevel missing not at random data
    Hammon, Angelina
    Zinn, Sabine
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2020, 69 (03) : 547 - 564
  • [49] Efficient random imputation for missing data in complex surveys
    Chen, J
    Rao, JNK
    Sitter, RR
    [J]. STATISTICA SINICA, 2000, 10 (04) : 1153 - 1169
  • [50] Missing Data Reconstruction for Remote Sensing Images With Weighted Low-Rank Tensor Model
    Cheng, Qing
    Yuan, Qiangqiang
    Ng, Michael Kwok-Po
    Shen, Huanfeng
    Zhang, Liangpei
    [J]. IEEE ACCESS, 2019, 7 : 142339 - 142352