Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research

被引:6
|
作者
Zhou, Yizhao [1 ,2 ]
Shi, Jiasheng [1 ,2 ]
Stein, Ronen [2 ,3 ]
Liu, Xiaokang [1 ]
Baldassano, Robert N. [2 ,3 ]
Forrest, Christopher B. [2 ,3 ]
Chen, Yong [1 ]
Huang, Jing [1 ,2 ,4 ]
机构
[1] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA USA
[2] Childrens Hosp Philadelphia, Dept Pediat, Philadelphia, PA USA
[3] Univ Penn, Perelman Sch Med, Dept Pediat, Philadelphia, PA USA
[4] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Blockley Hall,Rm 625,423 Guardian Dr, Philadelphia, PA 19104 USA
关键词
electronic health records; empirical study; missing data; multiple imputation; ELECTRONIC HEALTH RECORDS; MULTIPLE IMPUTATION; INFERENCE; BIASES; TIME;
D O I
10.1093/jamia/ocad066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives: The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods. Materials and Methods: We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data. Results: When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression. Discussion and Conclusion: Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.
引用
收藏
页码:1246 / 1256
页数:11
相关论文
共 50 条
  • [1] MISSING DATA IN EVALUATION RESEARCH
    RAYMOND, MR
    [J]. EVALUATION & THE HEALTH PROFESSIONS, 1986, 9 (04) : 395 - 420
  • [2] Effects of visualizing missing data: an empirical evaluation
    Andreasson, Rebecca
    Riveiro, Maria
    [J]. 2014 18TH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION (IV), 2014, : 132 - 138
  • [3] An empirical evaluation of the impact of missing data on treatment effect
    Royes Joseph
    Julius Sim
    Reuben Ogollah
    Martyn Lewis
    [J]. Trials, 16
  • [4] An empirical evaluation of the impact of missing data on treatment effect
    Joseph, Royes
    Sim, Julius
    Ogollah, Reuben
    Lewis, Martyn
    [J]. TRIALS, 2015, 16
  • [5] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 162 (01) : 138 - 139
  • [6] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    [J]. AMERICAN JOURNAL OF OPHTHALMOLOGY, 2022, 162 (01) : 138 - 139
  • [7] eXITs: An Ensemble Approach for Imputing Missing EHR Data
    Coddle, James
    Sarker, Hullo
    Chakraborty, Prithwish
    Ghalwash, Mohamed
    Yao, Zijun
    Sow, Daby
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 544 - 546
  • [8] Addressing missing data in the estimation of time-varying treatments in comparative effectiveness research
    Segura-Buisan, Juan
    Leyrat, Clemence
    Gomes, Manuel
    [J]. STATISTICS IN MEDICINE, 2023, 42 (27) : 5025 - 5038
  • [9] Missing Data: The Importance and Impact of Missing Data from Clinical Research
    Padgett, Christine R.
    Skilbeck, Clive E.
    Summers, Mathew James
    [J]. BRAIN IMPAIRMENT, 2014, 15 (01) : 1 - 9
  • [10] Leveraging EHR Data for Outcomes and Comparative Effectiveness Research in Oncology
    Manion, Frank J.
    Harris, Marcelline R.
    Buyuktur, Ayse G.
    Clark, Patricia M.
    An, Lawrence C.
    Hanauer, David A.
    [J]. CURRENT ONCOLOGY REPORTS, 2012, 14 (06) : 494 - 501