Imputation and low-rank estimation with Missing Not At Random data

被引:25
|
作者
Sportisse, Aude [1 ,2 ]
Boyer, Claire [1 ,3 ]
Josse, Julie [2 ,4 ]
机构
[1] Sorbonne Univ, Lab Probabilites Stat & Modelisat, Paris, France
[2] Ecole Polytech, Ctr Math Appl, Palaiseau, France
[3] Ecole Normale Super, Dept Math & Applicat, Paris, France
[4] INRIA Saclay, XPOP, Palaiseau, France
关键词
Informative missing values; Denoising; Matrix completion; Accelerated proximal gradient method; EM algorithm; Nuclear norm penalty; THRESHOLDING ALGORITHM; MAXIMUM-LIKELIHOOD; MATRIX COMPLETION; SHRINKAGE; MODELS;
D O I
10.1007/s11222-020-09963-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data. Our first contribution is to suggest a model-based estimation strategy by modelling the missing mechanism distribution. An EM algorithm is then implemented, involving a Fast Iterative Soft-Thresholding Algorithm (FISTA). Our second contribution is to suggest a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and the missing mechanism: the data matrix is concatenated with the mask coding for the missing values; a low-rank structure for exponential family is assumed on this new matrix, in order to encode links between variables and missing mechanisms. The methodology that has the great advantage of handling different missing value mechanisms is robust to model specification errors. The performances of our methods are assessed on the real data collected from a trauma registry (TraumaBase (R)) containing clinical information about over twenty thousand severely traumatized patients in France. The aim is then to predict if the doctors should administrate tranexomic acid to patients with traumatic brain injury, that would limit excessive bleeding.
引用
收藏
页码:1629 / 1643
页数:15
相关论文
共 50 条
  • [1] Imputation and low-rank estimation with Missing Not At Random data
    Aude Sportisse
    Claire Boyer
    Julie Josse
    [J]. Statistics and Computing, 2020, 30 : 1629 - 1643
  • [2] Imputation of Missing Wind Speed Data Based on Low-Rank Matrix Approximation
    Xie, Zong-Xia
    Sun, Xiao-Fei
    [J]. PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON POWER AND RENEWABLE ENERGY (ICPRE), 2017, : 397 - 401
  • [3] Convolutional Low-Rank Tensor Representation for Structural Missing Traffic Data Imputation
    Li, Ben-Zheng
    Zhao, Xi-Le
    Chen, Xinyu
    Ding, Meng
    Liu, Ryan Wen
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024,
  • [4] Missing Data Imputation Based on Low-Rank Recovery and Semi-Supervised Regression for Software Effort Estimation
    Jing, Xiao-Yuan
    Qi, Fumin
    Wu, Fei
    Xu, Baowen
    [J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, : 607 - 618
  • [5] Imputation of Streaming Low-Rank Tensor Data
    Mardani, Morteza
    Mateos, Gonzalo
    Giannakis, Georgios B.
    [J]. 2014 IEEE 8TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2014, : 433 - 436
  • [6] A Low-Rank Tensor Model for Imputation of Missing Vehicular Traffic Volume
    Pastor, Giancarlo
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2018, 67 (09) : 8934 - 8938
  • [7] STRUCTURED LOW-RANK APPROXIMATION WITH MISSING DATA
    Markovsky, Ivan
    Usevich, Konstantin
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2013, 34 (02) : 814 - 830
  • [8] NONPARAMETRIC LOW-RANK TENSOR IMPUTATION
    Bazerque, Juan Andres
    Mateos, Gonzalo
    Giannakis, Georgios B.
    [J]. 2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 876 - 879
  • [9] Low-Rank Tensor and Hybrid Smoothness Regularization-Based Approach for Traffic Data Imputation With Multimodal Missing
    Zeng, Zeyu
    Liu, Bin
    Feng, Jun
    Yang, Xiaolin
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024,
  • [10] Low-Rank Autoregressive Tensor Completion for Spatiotemporal Traffic Data Imputation
    Chen, Xinyu
    Lei, Mengying
    Saunier, Nicolas
    Sun, Lijun
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) : 12301 - 12310