Dealing with missing data in a multi-question depression scale: A comparison of imputation methods

被引:453
|
作者
Shrive F.M. [1 ,3 ,4 ]
Stuart H. [4 ,5 ]
Quan H. [1 ,3 ,4 ]
Ghali W.A. [1 ,2 ,3 ,4 ]
机构
[1] Department of Community Health Sciences, Faculty of Medicine, University of Calgary, Alta.
[2] Department of Medicine, Faculty of Medicine, University of Calgary, Alta.
[3] Centre for Health and Policy Studies, Faculty of Medicine, University of Calgary, Alta.
[4] Centre for the Advancement of Health, Calgary Health Region, Calgary, Alta.
[5] Department of Community Health Sciences, Faculty of Medicine, Queen's University, Kingston, Ont.
关键词
Multiple Imputation; Kappa Statistic; Imputation Method; Single Regression; Preceding Question;
D O I
10.1186/1471-2288-6-57
中图分类号
学科分类号
摘要
Background: Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depression scale (SDS). Methods: 1580 participants from a surgical outcomes study completed the SDS. The SDS is a 20 question scale that respondents complete by circling a value of 1 to 4 for each question. The sum of the responses is calculated and respondents are classified as exhibiting depressive symptoms when their total score is over 40. Missing values were simulated by randomly selecting questions whose values were then deleted (a missing completely at random simulation). Additionally, a missing at random and missing not at random simulation were completed. Six imputation methods were then considered; 1) multiple imputation, 2) single regression, 3) individual mean, 4) overall mean, 5) participant's preceding response, and 6) random selection of a value from 1 to 4. For each method, the imputed mean SDS score and standard deviation were compared to the population statistics. The Spearman correlation coefficient, percent misclassified and the Kappa statistic were also calculated. Results: When 10% of values are missing, all the imputation methods except random selection produce Kappa statistics greater than 0.80 indicating 'near perfect' agreement. MI produces the most valid imputed values with a high Kappa statistic (0.89), although both single regression and individual mean imputation also produced favorable results. As the percent of missing information increased to 30%, or when unbalanced missing data were introduced, MI maintained a high Kappa statistic. The individual mean and single regression method produced Kappas in the 'substantial agreement' range (0.76 and 0.74 respectively). Conclusion: Multiple imputation is the most accurate method for dealing with missing data in most of the missind data scenarios we assessed for the SDS. Imputing the individual's mean is also an appropriate and simple method for dealing with missing data that may be more interpretable to the majority of medical readers. Researchers should consider conducting methodological assessments such as this one when confronted with missing data. The optimal method should balance validity, ease of interpretability for readers, and analysis expertise of the research team. © 2006 Shrive et al; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 50 条
  • [1] Dealing with missing data in a multi-question depression scale: A comparision of imputation methods.
    Shrive, FM
    Stuart, H
    Quan, H
    Faris, PD
    Ghali, WA
    [J]. JOURNAL OF GENERAL INTERNAL MEDICINE, 2003, 18 : 216 - 217
  • [2] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    [J]. NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [3] Imputation of missing longitudinal data: a comparison of methods
    Engels, JM
    Diehr, P
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) : 968 - 976
  • [4] Missing traffic data: comparison of imputation methods
    Li, Yuebiao
    Li, Zhiheng
    Li, Li
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2014, 8 (01) : 51 - 57
  • [5] A comparison of various software tools for dealing with missing data via imputation
    Abrahantes, Jose Cortinas
    Sotto, Cristina
    Molenberghs, Geert
    Vromman, Geert
    Bierinckx, Bart
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2011, 81 (11) : 1653 - 1675
  • [6] DEALING WITH QUALITY OF LIFE MISSING DATA IN A SINGLE ARM STUDY. COMPARISON OF MULTIPLE IMPUTATION METHODS
    Arnault, A.
    Ivanescu, C.
    van Engen, A.
    Peeters, P.
    [J]. VALUE IN HEALTH, 2008, 11 (06) : A494 - A495
  • [7] Comparison of missing data imputation methods using weather data
    Nida, Hafiza
    Kashif, Muhammad
    Khan, Muhammad Imran
    Ghamkhar, Madiha
    [J]. PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, 2023, 60 (02): : 327 - 336
  • [8] A comparison of imputation methods for the consecutive missing temperature data
    Kim, Hee-Kyung
    Kang, In-Kyeong
    Lee, Jae-Won
    Lee, Yung-Seop
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (03) : 549 - 557
  • [9] Introduction to multiple imputation for dealing with missing data
    Lee, Katherine J.
    Simpson, Julie A.
    [J]. RESPIROLOGY, 2014, 19 (02) : 162 - 167
  • [10] Application and Comparison of Imputation Methods for Missing Degradation Data
    Fan, Ye
    Sun, Fuqiang
    Jiang, Tongmin
    [J]. ENGINEERING ASSET MANAGEMENT - SYSTEMS, PROFESSIONAL PRACTICES AND CERTIFICATION, 2015, : 1607 - 1614