Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

被引:136
|
作者
Blankers, Matthijs [1 ,2 ]
Koeter, Maarten W. J. [1 ]
Schippers, Gerard M. [1 ,2 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, AIAR, Dept Psychiat, NL-1100 DD Amsterdam, Netherlands
[2] Arkin Acad, Amsterdam, Netherlands
关键词
Missing data; multiple imputation; Internet; methodology; MULTIPLE IMPUTATION; SUBSTANCE USE; ATTRITION; SAMPLE; PRIMER;
D O I
10.2196/jmir.1448
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
引用
收藏
页码:e54p.1 / e54p.11
页数:11
相关论文
共 50 条
  • [41] Exploring Arab researchers' research data sharing and requesting practices: a survey study
    Subaveerapandiyan, A.
    Amees, Mohammad
    Annamma, Lovely M.
    Yadav, Upasana
    Mushanga, Kapata
    ONLINE INFORMATION REVIEW, 2024, 48 (03) : 457 - 475
  • [42] Research data management and sharing among researchers in Arab universities: An exploratory study
    Elsayed, Amany M.
    Saleh, Emad, I
    IFLA JOURNAL-INTERNATIONAL FEDERATION OF LIBRARY ASSOCIATIONS, 2018, 44 (04): : 281 - 299
  • [43] Preparedness for Research Data Sharing: A Study of University Researchers in Three European Countries
    Chowdhury, Gobinda
    Boustany, Joumana
    Kurbanoglu, Serap
    Unal, Yurdagul
    Walton, Geoff
    DIGITAL LIBRARIES: DATA, INFORMATION, AND KNOWLEDGE FOR DIGITAL LIVES, 2017, 10647
  • [44] Social Researchers' Approaches to Research Ethics During the COVID-19 Pandemic: An Exploratory Study
    Surmiak, Adrianna
    Bielska, Beata
    Kalinowska, Katarzyna
    JOURNAL OF EMPIRICAL RESEARCH ON HUMAN RESEARCH ETHICS, 2022, 17 (1-2) : 213 - 222
  • [45] MISSING PATIENT REPORTED OUTCOME DATA IN CLINICAL TRIALS: AN OVERVIEW AND SIMULATION STUDY
    McGinley, J. S.
    Savord, A.
    Chan, E.
    Larbalestier, A.
    Liu, Y.
    Delong, P. S.
    Wirth, R. J.
    VALUE IN HEALTH, 2023, 26 (06) : S281 - S281
  • [46] Compatibility in Missing Data Handling Across the Prediction Model Pipeline: A Simulation Study
    Tsvetanova, Antonia
    Sperrin, Matthew
    Jenkins, David
    Peek, Niels
    Buchan, Iain
    Hyland, Stephanie
    Martin, Glen
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 1476 - 1477
  • [47] Devising a Missing Data Rule for a Quality of Life Questionnaire-A Simulation Study
    Jacoby, Peter
    Whitehouse, Andrew
    Leonard, Helen
    Saldaris, Jacinta
    Demarest, Scott
    Benke, Tim
    Downs, Jenny
    JOURNAL OF DEVELOPMENTAL AND BEHAVIORAL PEDIATRICS, 2022, 43 (06): : E414 - E418
  • [48] Impact of missing data on estimates of outcome after stroke: A simulation study.
    Crichton, S. L.
    Peacock, J.
    Wolfe, C. D. A.
    CEREBROVASCULAR DISEASES, 2014, 37 : 610 - 610
  • [49] Managing Missing Data in the Hospital Survey on Patient Safety Culture: A Simulation Study
    Boussat, Bastien
    Francois, Olivier
    Viotti, Julien
    Seigneurin, Arnaud
    Giai, Joris
    Francois, Patrice
    Labarere, Jose
    JOURNAL OF PATIENT SAFETY, 2021, 17 (02) : E98 - E106
  • [50] Psychometric properties in the face of missing data - a simulation study assessing the effect of missing data on test-retest reliability in diary studies
    Griffiths, Philip
    Floden, Lysbeth
    Doll, Helen
    Morris, Mark
    Hudgens, Stacie
    QUALITY OF LIFE RESEARCH, 2018, 27 : S55 - S55