Comparison of imputation methods for missing laboratory data in medicine

被引:301
|
作者
Waljee, Akbar K. [1 ,2 ]
Mukherjee, Ashin [3 ]
Singal, Amit G. [4 ,5 ]
Zhang, Yiwei [3 ]
Warren, Jeffrey [6 ]
Balis, Ulysses [6 ]
Marrero, Jorge [4 ]
Zhu, Ji [3 ]
Higgins, Peter D. R. [1 ]
机构
[1] Univ Michigan, Dept Internal Med, Ann Arbor, MI 48109 USA
[2] Vet Affairs Ctr Clin Management Res, Ann Arbor, MI USA
[3] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[4] UT Southwestern Med Ctr, Dept Internal Med, Dallas, TX USA
[5] UT Southwestern, Dept Clin Sci, Dallas, TX USA
[6] Univ Michigan, Dept Pathol, Ann Arbor, MI 48109 USA
来源
BMJ OPEN | 2013年 / 3卷 / 08期
关键词
HEPATOCELLULAR-CARCINOMA;
D O I
10.1136/bmjopen-2013-002847
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: Missing laboratory data is a common issue, but the optimal method of imputation of missing values has not been determined. The aims of our study were to compare the accuracy of four imputation methods for missing completely at random laboratory data and to compare the effect of the imputed values on the accuracy of two clinical predictive models. Design: Retrospective cohort analysis of two large data sets. Setting: A tertiary level care institution in Ann Arbor, Michigan. Participants: The Cirrhosis cohort had 446 patients and the Inflammatory Bowel Disease cohort had 395 patients. Methods: Non-missing laboratory data were randomly removed with varying frequencies from two large data sets, and we then compared the ability of four methods-missForest, mean imputation, nearest neighbour imputation and multivariate imputation by chained equations (MICE)-to impute the simulated missing data. We characterised the accuracy of the imputation and the effect of the imputation on predictive ability in two large data sets. Results: MissForest had the least imputation error for both continuous and categorical variables at each frequency of missingness, and it had the smallest prediction difference when models used imputed laboratory values. In both data sets, MICE had the second least imputation error and prediction difference, followed by the nearest neighbour and mean imputation. Conclusions: MissForest is a highly accurate method of imputation for missing laboratory data and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in two clinical predicative models.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets
    JiaHang Li
    ShuXia Guo
    RuLin Ma
    Jia He
    XiangHui Zhang
    DongSheng Rui
    YuSong Ding
    Yu Li
    LeYao Jian
    Jing Cheng
    Heng Guo
    [J]. BMC Medical Research Methodology, 24
  • [32] Dealing with missing data in a multi-question depression scale: A comparison of imputation methods
    Shrive F.M.
    Stuart H.
    Quan H.
    Ghali W.A.
    [J]. BMC Medical Research Methodology, 6 (1)
  • [33] Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets
    Li, JiaHang
    Guo, ShuXia
    Ma, RuLin
    He, Jia
    Zhang, XiangHui
    Rui, DongSheng
    Ding, YuSong
    Li, Yu
    Jian, LeYao
    Cheng, Jing
    Guo, Heng
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [34] Some imputation methods for missing data in sample surveys
    Singh, G. N.
    Maurya, S.
    Khetan, M.
    Kadilar, Cem
    [J]. Hacettepe Journal of Mathematics and Statistics, 2016, 45 (06): : 1865 - 1880
  • [35] Ensemble imputation methods for missing software engineering data
    Twala, B
    Cartwright, M
    [J]. 2005 11TH INTERNATIONAL SYMPOSIUM ON SOFTWARE METRICS (METRICS), 2005, : 268 - 277
  • [36] Comparison of missing value imputation methods in time series: the case of Turkish meteorological data
    Yozgatligil, Ceylan
    Aslan, Sipan
    Iyigun, Cem
    Batmaz, Inci
    [J]. THEORETICAL AND APPLIED CLIMATOLOGY, 2013, 112 (1-2) : 143 - 167
  • [37] Imputation methods for missing data in educational diagnostic evaluation
    Fernandez-Alonso, Ruben
    Suarez-Alvarez, Javier
    Muniz, Jose
    [J]. PSICOTHEMA, 2012, 24 (01) : 167 - 175
  • [38] New imputation methods for missing data using quantiles
    Munoz, J. F.
    Rueda, M.
    [J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2009, 232 (02) : 305 - 317
  • [39] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    [J]. THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [40] Some Concerns About Imputation Methods for Missing Data
    Toyomoto, Rie
    Funada, Satoshi
    Furukawa, Toshi A.
    [J]. JAMA PSYCHIATRY, 2022, 79 (03) : 270 - 270