Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research

被引:6
|
作者
Zhou, Yizhao [1 ,2 ]
Shi, Jiasheng [1 ,2 ]
Stein, Ronen [2 ,3 ]
Liu, Xiaokang [1 ]
Baldassano, Robert N. [2 ,3 ]
Forrest, Christopher B. [2 ,3 ]
Chen, Yong [1 ]
Huang, Jing [1 ,2 ,4 ]
机构
[1] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA USA
[2] Childrens Hosp Philadelphia, Dept Pediat, Philadelphia, PA USA
[3] Univ Penn, Perelman Sch Med, Dept Pediat, Philadelphia, PA USA
[4] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Blockley Hall,Rm 625,423 Guardian Dr, Philadelphia, PA 19104 USA
关键词
electronic health records; empirical study; missing data; multiple imputation; ELECTRONIC HEALTH RECORDS; MULTIPLE IMPUTATION; INFERENCE; BIASES; TIME;
D O I
10.1093/jamia/ocad066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives: The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods. Materials and Methods: We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data. Results: When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression. Discussion and Conclusion: Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.
引用
收藏
页码:1246 / 1256
页数:11
相关论文
共 50 条
  • [21] Dealing With Missing Data in Developmental Research
    Enders, Craig K.
    [J]. CHILD DEVELOPMENT PERSPECTIVES, 2013, 7 (01) : 27 - 31
  • [22] Implications of missing data in survey research
    Montiel-Overall, Patricia
    [J]. CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2006, 30 (3-4): : 241 - +
  • [23] MISSING DATA IN PSYCHIATRIC RESEARCH - A SOLUTION
    WELCH, WP
    FRANK, RG
    COSTELLO, AJ
    [J]. PSYCHOLOGICAL BULLETIN, 1983, 94 (01) : 177 - 180
  • [24] Crypto research: are fundamental data missing?
    Klonicki, PT
    Hancock, CM
    Straub, TM
    Harris, SI
    Hancock, KW
    Alyaseri, AN
    Meyer, CJ
    Sturbaum, GD
    [J]. JOURNAL AMERICAN WATER WORKS ASSOCIATION, 1997, 89 (09): : 97 - 103
  • [25] Crypto research: are fundamental data missing?
    Klonicki, Patricia T.
    Hancock, Carrie M.
    Straub, Timothy M.
    Harris, Stephanie I.
    Hancock, Keith W.
    Alyaseri, Ali N.
    Meyer, Charles J.
    Sturbaum, Gregory D.
    [J]. American Water Works Assoc, Denver, CO, United States (89):
  • [26] Prospective EHR-Based Clinical Trials: The Challenge of Missing Data
    Kharrazi, Hadi
    Wang, Chenguang
    Scharfstein, Daniel
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2014, 29 (07) : 976 - 978
  • [27] Prospective EHR-Based Clinical Trials: The Challenge of Missing Data
    Hadi Kharrazi
    Chenguang Wang
    Daniel Scharfstein
    [J]. Journal of General Internal Medicine, 2014, 29 : 976 - 978
  • [28] Credit evaluation with missing data fields
    Madey, G.
    Denton, J.
    [J]. Neural Networks, 1988, 1 (1 SUPPL)
  • [29] NONRANDOMLY MISSING DATA IN MULTIPLE-REGRESSION - AN EMPIRICAL-COMPARISON OF COMMON MISSING-DATA TREATMENTS
    KROMREY, JD
    HINES, CV
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1994, 54 (03) : 573 - 593
  • [30] A comparative study on repeated measurements data in the presence of missing data
    Al-Rawwash, Mohammad Y.
    Alquran, Haneen
    [J]. ELECTRONIC JOURNAL OF APPLIED STATISTICAL ANALYSIS, 2023, 16 (02) : 410 - 422