How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

被引:40
|
作者
Stayseth, Marianne Riksheim [1 ]
Clausen, Thomas [1 ]
Roislien, Jo [1 ,2 ]
机构
[1] Univ Oslo, Norwegian Ctr Addict Res, Inst Clin Med, N-0315 Oslo, Norway
[2] Univ Stavanger, Fac Hlth Sci, Stavanger, Norway
来源
SAGE OPEN MEDICINE | 2019年 / 7卷
关键词
Missing data; categorical data; multiple imputation; hot deck imputation; multiple correspondence analysis; complete case analysis; random forests; latent class analysis; HOT DECK IMPUTATION; MULTIPLE-IMPUTATION; MAINTENANCE TREATMENT; INCOMPLETE-DATA; METHADONE; REGRESSION; DISCRETE; VALUES; BIAS;
D O I
10.1177/2050312118822912
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical. Methods: In addition to the commonly used complete case analysis, we tested the following six imputation methods: multiple imputation using expectation-maximization with bootstrapping, multiple imputation using multiple correspondence analysis, multiple imputation using latent class analysis, multiple hot deck imputation and multivariate imputation by chained equations with two different model specifications: logistic regression and random forests. The methods are tested on real data from a questionnaire-based study in the Norwegian opioid maintenance treatment programme. Results: All methods performed relatively well when the sample size was large (n = 1000). For a smaller sample size (n = 200), the regression estimates depend heavily on the level of missing. When the amount of missing was > 20%, in particular, complete case analysis, hot deck and random forests had biased estimates with too low coverage. Multiple imputation using multiple correspondence analysis had the best performance all over. Conclusion: The choice of missing handling methodology has a significant impact on the clinical interpretation of the accompanying statistical analyses. With missing data, the choice of whether to impute or not, and choice of imputation method, can influence clinical conclusion drawn from a regression model and should therefore be given sufficient consideration.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Missing Network Data A Comparison of Different Imputation Methods
    Krause, Robert W.
    Huisman, Mark
    Steglich, Christian
    Snijders, Tom A. B.
    [J]. 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 159 - 163
  • [2] A comparison of imputation techniques for handling missing data
    Musil, CM
    Warner, CB
    Yobas, PK
    Jones, SL
    [J]. WESTERN JOURNAL OF NURSING RESEARCH, 2002, 24 (07) : 815 - 829
  • [3] Handling Missing Data in Presence of Categorical Variables: a New Imputation Procedure
    Ferrari, Pier Alda
    Barbiero, Alessandro
    Manzi, Giancarlo
    [J]. NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, : 473 - 480
  • [4] Handling missing data for the identification of charged particles in a multilayer detector: A comparison between different imputation methods
    Riggi, S.
    Riggi, D.
    Riggi, F.
    [J]. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2015, 780 : 81 - 90
  • [5] Imputation of missing longitudinal data: a comparison of methods
    Engels, JM
    Diehr, P
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) : 968 - 976
  • [6] Missing traffic data: comparison of imputation methods
    Li, Yuebiao
    Li, Zhiheng
    Li, Li
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2014, 8 (01) : 51 - 57
  • [7] An Empirical Comparison of Multiple Imputation Methods for Categorical Data
    Akande, Olanrewaju
    Li, Fan
    Reiter, Jerome
    [J]. AMERICAN STATISTICIAN, 2017, 71 (02): : 162 - 170
  • [8] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    [J]. PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [9] Imputation Methods for Handling Missing Dietary Supplement Dosage Data
    Leung, June
    Dwyer, Johanna
    Hibberd, Patricia
    Jacques, Paul
    Rand, William
    [J]. JOURNAL OF RENAL NUTRITION, 2010, 20 (05) : 342 - 347
  • [10] A comparison of imputation methods for the consecutive missing temperature data
    Kim, Hee-Kyung
    Kang, In-Kyeong
    Lee, Jae-Won
    Lee, Yung-Seop
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (03) : 549 - 557