Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

被引:136
|
作者
Blankers, Matthijs [1 ,2 ]
Koeter, Maarten W. J. [1 ]
Schippers, Gerard M. [1 ,2 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, AIAR, Dept Psychiat, NL-1100 DD Amsterdam, Netherlands
[2] Arkin Acad, Amsterdam, Netherlands
关键词
Missing data; multiple imputation; Internet; methodology; MULTIPLE IMPUTATION; SUBSTANCE USE; ATTRITION; SAMPLE; PRIMER;
D O I
10.2196/jmir.1448
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
引用
收藏
页码:e54p.1 / e54p.11
页数:11
相关论文
共 50 条
  • [11] A cautionary case study of approaches to the treatment of missing data
    Paul, Christopher
    Mason, William M.
    McCaffrey, Daniel
    Fox, Sarah A.
    STATISTICAL METHODS AND APPLICATIONS, 2008, 17 (03): : 351 - 372
  • [12] Imputation of Missing Covariate Data Prior to Propensity Score Analysis: A Tutorial and Evaluation of the Robustness of Practical Approaches
    Leite, Walter L.
    Aydin, Burak
    Cetin-Berber, Dee D.
    EVALUATION REVIEW, 2021, 45 (1-2) : 34 - 69
  • [13] Approaches to Handling Missing Data Within Developmental and Behavioral Pediatric Research
    Aylward, Brandon S.
    Anderson, Rawni A.
    Nelson, Timothy D.
    JOURNAL OF DEVELOPMENTAL AND BEHAVIORAL PEDIATRICS, 2010, 31 (01): : 54 - 60
  • [14] New approaches to missing data in psychological research: Introduction to the special section
    West, SG
    PSYCHOLOGICAL METHODS, 2001, 6 (04) : 315 - 316
  • [15] TUTORIAL: AI research without coding: The art of fighting without fighting: Data science for qualitative researchers
    Ciechanowski, Leon
    Jemielniak, Dariusz
    Gloor, Peter A.
    JOURNAL OF BUSINESS RESEARCH, 2020, 117 : 322 - 330
  • [16] Accounting for missing data caused by drug cessation in observational comparative effectiveness research: a simulation study
    Mongin, Denis
    Lauper, Kim
    Finckh, Axel
    Frisell, Thomas
    Courvoisier, Delphine Sophie
    ANNALS OF THE RHEUMATIC DISEASES, 2022, 81 (05) : 729 - 736
  • [17] Missing data in substance abuse research? Researchers' reporting practices of sexual orientation and gender identity
    Flentje, Annesa
    Bacca, Cristina L.
    Cochran, Bryan N.
    DRUG AND ALCOHOL DEPENDENCE, 2015, 147 : 280 - 284
  • [18] Pragmatic approaches to mitigating missing data and research priorities to assess the effectiveness of interventions
    Kearney, Anna
    Daykin, Anne
    Heawood, Ali
    Lane, Athene
    Blazeby, Jane
    Clarke, Mike
    Williamson, Paula
    Gamble, Carrol
    TRIALS, 2015, 16
  • [19] Missing Data in Substance Abuse Treatment Research: Current Methods and Modern Approaches
    McPherson, Sterling
    Barbosa-Leiker, Celestina
    Burns, G. Leonard
    Howell, Donelle
    Roll, John
    EXPERIMENTAL AND CLINICAL PSYCHOPHARMACOLOGY, 2012, 20 (03) : 243 - 250
  • [20] Investigating Parallel Analysis in the Context of Missing Data: A Simulation Study Comparing Six Missing Data Methods
    Goretzko, David
    Heumann, Christian
    Buehner, Markus
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2020, 80 (04) : 756 - 774