A comparison of various software tools for dealing with missing data via imputation

被引:6
|
作者
Abrahantes, Jose Cortinas [1 ]
Sotto, Cristina [1 ,2 ]
Molenberghs, Geert [1 ,3 ]
Vromman, Geert [4 ]
Bierinckx, Bart [4 ]
机构
[1] Univ Hasselt, Interuniv Inst Biostat & Stat Bioinformat, B-3590 Diepenbeek, Belgium
[2] Univ Philippines, Sch Stat, Quezon City, Philippines
[3] Katholieke Univ Leuven, Interuniv Inst Biostat & Stat Bioinformat, B-3000 Louvain, Belgium
[4] IM Associates BVBA, Sales & Mkt Effectiveness, B-3000 Louvain, Belgium
关键词
multiple imputation; missing data; missing at random; missing not at random; random forest; MULTIPLE IMPUTATION; LONGITUDINAL DATA; INCOMPLETE DATA; REGRESSION; MODELS;
D O I
10.1080/00949655.2010.498788
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e. g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual - an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.
引用
收藏
页码:1653 / 1675
页数:23
相关论文
共 50 条
  • [1] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    [J]. NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [2] Introduction to multiple imputation for dealing with missing data
    Lee, Katherine J.
    Simpson, Julie A.
    [J]. RESPIROLOGY, 2014, 19 (02) : 162 - 167
  • [3] Multiple imputation: a mature approach to dealing with missing data
    Chevret, S.
    Seaman, S.
    Resche-Rigon, M.
    [J]. INTENSIVE CARE MEDICINE, 2015, 41 (02) : 348 - 350
  • [4] Multiple imputation: a mature approach to dealing with missing data
    S. Chevret
    S. Seaman
    M. Resche-Rigon
    [J]. Intensive Care Medicine, 2015, 41 : 348 - 350
  • [5] Multiple Imputation Ensembles (MIE) for Dealing with Missing Data
    Aleryani A.
    Wang W.
    de la Iglesia B.
    [J]. SN Computer Science, 2020, 1 (3)
  • [6] Dealing with missing software project data
    Cartwright, MH
    Shepperd, MJ
    Song, Q
    [J]. NINTH INTERNATIONAL SOFTWARE METRICS SYMPOSIUM, PROCEEDINGS, 2003, : 154 - 165
  • [7] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    [J]. SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [8] Dealing with missing data in a multi-question depression scale: A comparison of imputation methods
    Shrive F.M.
    Stuart H.
    Quan H.
    Ghali W.A.
    [J]. BMC Medical Research Methodology, 6 (1)
  • [9] Proper Use of Multiple Imputation and Dealing with Missing Covariate Data
    Saffari, Seyed Ehsan
    Volovici, Victor
    Ong, Marcus Eng Hock
    Goldstein, Benjamin Alan
    Vaughan, Roger
    Dammers, Ruben
    Steyerberg, Ewout W.
    Liu, Nan
    [J]. WORLD NEUROSURGERY, 2022, 161 : 284 - 290
  • [10] Ensemble imputation methods for missing software engineering data
    Twala, B
    Cartwright, M
    [J]. 2005 11TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS (METRICS), 2005, : 268 - 277