Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

被引:136
|
作者
Blankers, Matthijs [1 ,2 ]
Koeter, Maarten W. J. [1 ]
Schippers, Gerard M. [1 ,2 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, AIAR, Dept Psychiat, NL-1100 DD Amsterdam, Netherlands
[2] Arkin Acad, Amsterdam, Netherlands
关键词
Missing data; multiple imputation; Internet; methodology; MULTIPLE IMPUTATION; SUBSTANCE USE; ATTRITION; SAMPLE; PRIMER;
D O I
10.2196/jmir.1448
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
引用
收藏
页码:e54p.1 / e54p.11
页数:11
相关论文
共 50 条
  • [1] Missing Data in Clinical Research: A Tutorial on Multiple Imputation
    Austin, Peter C.
    White, Ian R.
    Lee, Douglas S.
    van Buuren, Stef
    CANADIAN JOURNAL OF CARDIOLOGY, 2021, 37 (09) : 1322 - 1331
  • [2] A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis
    Jahangiri, Mina
    Kazemnejad, Anoshirvan
    Goldfeld, Keith S.
    Daneshpour, Maryam S.
    Mostafaei, Shayan
    Khalili, Davood
    Moghadas, Mohammad Reza
    Akbarzadeh, Mahdi
    BMC MEDICAL RESEARCH METHODOLOGY, 2023, 23 (01)
  • [3] A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis
    Mina Jahangiri
    Anoshirvan Kazemnejad
    Keith S. Goldfeld
    Maryam S. Daneshpour
    Shayan Mostafaei
    Davood Khalili
    Mohammad Reza Moghadas
    Mahdi Akbarzadeh
    BMC Medical Research Methodology, 23
  • [4] Application of Missing Data Approaches in Software Testing Research
    Liu, Qin
    Qian, Wen
    Atanas, Atanasov
    2011 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL (ICECC), 2011, : 4187 - 4191
  • [5] Missing data in craniometrics: a simulation study
    Gauthier, O
    Landry, PA
    Lapointe, FJ
    ACTA THERIOLOGICA, 2003, 48 (01): : 25 - 34
  • [6] Missing data in craniometrics: a simulation study
    Olivier Gauthier
    Pierre-Alexandre Landry
    François-Joseph Lapointe
    Acta Theriologica, 2003, 48 : 25 - 34
  • [7] Real Research with Fake Data: A Tutorial on Conducting Computer Simulation for Research and Teaching
    Sturman, Michael C.
    ORGANIZATIONAL RESEARCH METHODS, 2025, 28 (01) : 76 - 113
  • [8] Missing data in primary care research: importance, implications and approaches
    Marino, Miguel
    Lucas, Jennifer
    Latour, Emile
    Heintzman, John D.
    FAMILY PRACTICE, 2021, 38 (02) : 200 - 203
  • [9] Multiple imputation for missing data in a longitudinal cohort study: a tutorial based on a detailed case study involving imputation of missing outcome data
    Lee, Katherine J.
    Roberts, Gehan
    Doyle, Lex W.
    Anderson, Peter J.
    Carlin, John B.
    INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2016, 19 (05) : 575 - 591
  • [10] A cautionary case study of approaches to the treatment of missing data
    Christopher Paul
    William M. Mason
    Daniel McCaffrey
    Sarah A. Fox
    Statistical Methods and Applications, 2008, 17 : 351 - 372