Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

被引:136
|
作者
Blankers, Matthijs [1 ,2 ]
Koeter, Maarten W. J. [1 ]
Schippers, Gerard M. [1 ,2 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, AIAR, Dept Psychiat, NL-1100 DD Amsterdam, Netherlands
[2] Arkin Acad, Amsterdam, Netherlands
关键词
Missing data; multiple imputation; Internet; methodology; MULTIPLE IMPUTATION; SUBSTANCE USE; ATTRITION; SAMPLE; PRIMER;
D O I
10.2196/jmir.1448
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
引用
收藏
页码:e54p.1 / e54p.11
页数:11
相关论文
共 50 条
  • [21] Pragmatic approaches to mitigating missing data and research priorities to assess the effectiveness of interventions
    Anna Kearney
    Anne Daykin
    Ali Heawood
    Athene Lane
    Jane Blazeby
    Mike Clarke
    Paula Williamson
    Carrol Gamble
    Trials, 16
  • [22] Determining Dimensionality with Dichotomous Variables: A Monte Carlo Simulation Study and Applications to Missing Data in Longitudinal Research
    Dai, Ting
    Davey, Adam
    MATHEMATICS, 2023, 11 (06)
  • [23] Pluralism in qualitative research: the impact of different researchers and qualitative approaches on the analysis of qualitative data
    Frost, Nollaig
    Nolas, Sevasti Melissa
    Brooks-Gordon, Belinda
    Esin, Cigdem
    Holt, Amanda
    Mehdizadeh, Leila
    Shinebourne, Pnina
    QUALITATIVE RESEARCH, 2010, 10 (04) : 441 - 460
  • [24] TWO APPROACHES TO EVALUATE MISSING CLINICAL OUTCOME ASSESSMENT RESPONSES: A SIMULATION STUDY
    Qin, S.
    Ma, J.
    Nelson, L.
    VALUE IN HEALTH, 2019, 22 : S319 - S319
  • [25] Missing data techniques in longitudinal epidemiologic research: A case study
    Strezsak, Valerie
    Hong, Jin-Liern
    Dolin, Paul
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2019, 28 : 64 - 65
  • [26] Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study
    Kawabata, Emily
    Major-Smith, Daniel
    Clayton, Gemma L.
    Shapland, Chin Yang
    Morris, Tim P.
    Carter, Alice R.
    Fernandez-Sanles, Alba
    Borges, Maria Carolina
    Tilling, Kate
    Griffith, Gareth J.
    Millard, Louise A. C.
    Smith, George Davey
    Lawlor, Deborah A.
    Hughes, Rachael A.
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [27] Propensity score approaches with multilevel data: A simulation study
    Borges, Gabriela L.
    Moreira, Marisleane
    Sanni Ali, M.
    Barreto, Mauricio L.
    Smeeth, Liam
    Fiaccone, Rosemeire L.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 387 - 387
  • [28] Comparison of different approaches in handling missing data in longitudinal multiple-item patient-reported outcomes: a simulation study
    Yan, Minqian
    Zhou, Lizhi
    Zhao, Chongye
    Shi, Chen
    Ou, Chunquan
    HEALTH AND QUALITY OF LIFE OUTCOMES, 2025, 23 (01)
  • [29] How do MIS researchers handle missing data in survey-based research: A content analysis approach
    Karanja, Erastus
    Zaveri, Jigish
    Ahmed, Ashraf
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2013, 33 (05) : 734 - 751
  • [30] A study of hybrid neural network approaches and the effects of missing data on traffic forecasting
    Chen, HB
    Grant-Muller, S
    Mussone, L
    Montgomery, F
    NEURAL COMPUTING & APPLICATIONS, 2001, 10 (03): : 277 - 286