Imputation and low-rank estimation with Missing Not At Random data

被引:25
|
作者
Sportisse, Aude [1 ,2 ]
Boyer, Claire [1 ,3 ]
Josse, Julie [2 ,4 ]
机构
[1] Sorbonne Univ, Lab Probabilites Stat & Modelisat, Paris, France
[2] Ecole Polytech, Ctr Math Appl, Palaiseau, France
[3] Ecole Normale Super, Dept Math & Applicat, Paris, France
[4] INRIA Saclay, XPOP, Palaiseau, France
关键词
Informative missing values; Denoising; Matrix completion; Accelerated proximal gradient method; EM algorithm; Nuclear norm penalty; THRESHOLDING ALGORITHM; MAXIMUM-LIKELIHOOD; MATRIX COMPLETION; SHRINKAGE; MODELS;
D O I
10.1007/s11222-020-09963-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data. Our first contribution is to suggest a model-based estimation strategy by modelling the missing mechanism distribution. An EM algorithm is then implemented, involving a Fast Iterative Soft-Thresholding Algorithm (FISTA). Our second contribution is to suggest a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and the missing mechanism: the data matrix is concatenated with the mask coding for the missing values; a low-rank structure for exponential family is assumed on this new matrix, in order to encode links between variables and missing mechanisms. The methodology that has the great advantage of handling different missing value mechanisms is robust to model specification errors. The performances of our methods are assessed on the real data collected from a trauma registry (TraumaBase (R)) containing clinical information about over twenty thousand severely traumatized patients in France. The aim is then to predict if the doctors should administrate tranexomic acid to patients with traumatic brain injury, that would limit excessive bleeding.
引用
收藏
页码:1629 / 1643
页数:15
相关论文
共 50 条
  • [11] Scalable low-rank tensor learning for spatiotemporal traffic data imputation
    Chen, Xinyu
    Chen, Yixian
    Saunier, Nicolas
    Sun, Lijun
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2021, 129
  • [12] Expand Dimensional of Seismic Data and Random Noise Attenuation Using Low-Rank Estimation
    Mafakheri, Javad
    Kahoo, Amin Roshandel
    Anvari, Rasoul
    Mohammadi, Mokhtar
    Radad, Mohammad
    Monfared, Mehrdad Soleimani
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2773 - 2781
  • [13] Low-rank model with covariates for count data with missing values
    Robin, Genevieve
    Josse, Julie
    Moulines, Eric
    Sardy, Sylvain
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 173 : 416 - 434
  • [14] A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation
    Chen, Xinyu
    Yang, Jinming
    Sun, Lijun
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 117
  • [15] A Novel Spatiotemporal Data Low-Rank Imputation Approach for Traffic Sensor Network
    Chen, Xiaobo
    Liang, Shurong
    Zhang, Zhihao
    Zhao, Feng
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (20): : 20122 - 20135
  • [16] Traffic Data Imputation Algorithm Based on Improved Low-Rank Matrix Decomposition
    Luo, Xianglong
    Meng, Xue
    Gan, Wenjuan
    Chen, Yonghong
    [J]. JOURNAL OF SENSORS, 2019, 2019
  • [17] Algorithms and Literate Programs for Weighted Low-Rank Approximation with Missing Data
    Markovsky, Ivan
    [J]. APPROXIMATION ALGORITHMS FOR COMPLEX SYSTEMS, 2011, 3 : 255 - 273
  • [18] Flexible Low-Rank Statistical Modeling with Missing Data and Side Information
    Fithian, William
    Mazumder, Rahul
    [J]. STATISTICAL SCIENCE, 2018, 33 (02) : 238 - 260
  • [19] LOW-RANK DATA MATRIX RECOVERY WITH MISSING VALUES AND FAULTY SENSORS
    Lopez-Valcarce, Roberto
    Sala-Alvarez, Josep
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [20] Clustering a union of low-rank subspaces of different dimensions with missing data
    Ashraphijuo, Morteza
    Wang, Xiaodong
    [J]. PATTERN RECOGNITION LETTERS, 2019, 120 : 31 - 35