Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

被引:136
|
作者
Blankers, Matthijs [1 ,2 ]
Koeter, Maarten W. J. [1 ]
Schippers, Gerard M. [1 ,2 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, AIAR, Dept Psychiat, NL-1100 DD Amsterdam, Netherlands
[2] Arkin Acad, Amsterdam, Netherlands
关键词
Missing data; multiple imputation; Internet; methodology; MULTIPLE IMPUTATION; SUBSTANCE USE; ATTRITION; SAMPLE; PRIMER;
D O I
10.2196/jmir.1448
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
引用
收藏
页码:e54p.1 / e54p.11
页数:11
相关论文
共 50 条
  • [31] A Study of Hybrid Neural Network Approaches and the Effects of Missing Data on Traffic Forecasting
    Haibo Chen
    Susan Grant-Muller
    Lorenzo Mussone
    Frank Montgomery
    Neural Computing & Applications, 2001, 10 : 277 - 286
  • [32] Estimating range of influence in case of missing spatial data: a simulation study on binary data
    Bihrmann, Kristine
    Ersboll, Annette K.
    INTERNATIONAL JOURNAL OF HEALTH GEOGRAPHICS, 2015, 14
  • [33] Estimating range of influence in case of missing spatial data: a simulation study on binary data
    Kristine Bihrmann
    Annette K Ersbøll
    International Journal of Health Geographics, 14
  • [34] Attrition Bias Related to Missing Outcome Data: A Longitudinal Simulation Study
    Lewin, Antoine
    Brondeel, Ruben
    Benmarhnia, Tarik
    Thomas, Frederique
    Chaix, Basile
    EPIDEMIOLOGY, 2018, 29 (01) : 87 - 95
  • [35] Consequences of handling missing data for treatment response in osteoarthritis: a simulation study
    Olsen, I. C.
    Kvien, T. K.
    Uhlig, T.
    OSTEOARTHRITIS AND CARTILAGE, 2012, 20 (08) : 822 - 828
  • [36] Investigating Variable Selection Techniques Under Missing Data: A Simulation Study
    Bain, Catherine
    Shi, Dingjing
    QUANTITATIVE PSYCHOLOGY, IMPS 2023, 2024, 452 : 109 - 119
  • [37] Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study
    Sisk, Rose
    Sperrin, Matthew
    Peek, Niels
    van Smeden, Maarten
    Martin, Glen Philip
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (08) : 1461 - 1477
  • [38] Why Are Data Missing in Clinical Data Warehouses? A Simulation Study of How Data Are Processed (and Can Be Lost)
    Priou, Sonia
    Lame, Guillaume
    Jankovic, Marija
    Chatellier, Gilles
    Bey, Romain
    Tournigand, Christophe
    Daniel, Christel
    Kempf, Emmanuelle
    CARING IS SHARING-EXPLOITING THE VALUE IN DATA FOR HEALTH AND INNOVATION-PROCEEDINGS OF MIE 2023, 2023, 302 : 202 - 206
  • [39] Is there a role for expectation maximization imputation in addressing missing data in research using WOMAC questionnaire? Comparison to the standard mean approach and a tutorial
    Ghomrawi, Hassan M. K.
    Mandl, Lisa A.
    Rutledge, John
    Alexiades, Michael M.
    Mazumdar, Madhu
    BMC MUSCULOSKELETAL DISORDERS, 2011, 12
  • [40] Is there a role for expectation maximization imputation in addressing missing data in research using WOMAC questionnaire? Comparison to the standard mean approach and a tutorial
    Hassan MK Ghomrawi
    Lisa A Mandl
    John Rutledge
    Michael M Alexiades
    Madhu Mazumdar
    BMC Musculoskeletal Disorders, 12