A nonparametric multiple imputation approach for missing categorical data

被引:5
|
作者
Zhou, Muhan [1 ]
He, Yulei [2 ]
Yu, Mandi [3 ]
Hsu, Chiu-Hsieh [1 ]
机构
[1] Univ Arizona, Dept Epidemiol & Biostat, Mel & Enid Zuckerman Coll Publ Hlth, 1295 N Martin Ave, Tucson, AZ 85724 USA
[2] Ctr Dis Control & Prevent, Div Res & Methodol, Natl Ctr Hlth Stat, Hyattsville, MD 20782 USA
[3] NCI, Div Canc Control & Populat Sci, Rockville, MD 20850 USA
关键词
Categorical data; Double robustness; Missing at Random; Multiple imputation; Nearest neighbour;
D O I
10.1186/s12874-017-0360-2
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. Methods: We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. Results: The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. Conclusions: We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A nonparametric multiple imputation approach for missing categorical data
    Muhan Zhou
    Yulei He
    Mandi Yu
    Chiu-Hsieh Hsu
    [J]. BMC Medical Research Methodology, 17
  • [2] Latent class based multiple imputation approach for missing categorical data
    Gebregziabher, Mulugeta
    DeSantis, Stacia M.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (11) : 3252 - 3262
  • [3] Missing Categorical Data Imputation Approach Based on Similarity
    Wu, Sen
    Feng, Xiaodong
    Han, Yushan
    Wang, Qiang
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2827 - 2832
  • [4] A Nonparametric Multiple Imputation Approach for Data with Missing Covariate Values with Application to Colorectal Adenoma Data
    Hsu, Chiu-Hsieh
    Long, Qi
    Li, Yisheng
    Jacobs, Elizabeth
    [J]. JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2014, 24 (03) : 634 - 648
  • [5] MULTIPLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR THE TREATMENT OF MISSING DATA
    Chen, Sixia
    Haziza, David
    [J]. STATISTICA SINICA, 2019, 29 (04) : 2035 - 2053
  • [6] Categorical missing data imputation approach via sparse representation
    Shao, Xiaochen
    Wu, Sen
    Feng, Xiaodong
    Song, Rui
    [J]. INTERNATIONAL JOURNAL OF SERVICES TECHNOLOGY AND MANAGEMENT, 2016, 22 (3-5) : 256 - 270
  • [7] DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA
    Long, Qi
    Hsu, Chiu-Hsieh
    Li, Yisheng
    [J]. STATISTICA SINICA, 2012, 22 (01) : 149 - 172
  • [8] Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations
    Karangwa, Innocent
    Kotze, Danelle
    Blignaut, Renette
    [J]. BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2016, 30 (04) : 521 - 539
  • [9] A Novel Nonparametric Multiple Imputation Algorithm for Estimating Missing Data
    Gheyas, Iffat A.
    Smith, Leslie S.
    [J]. WORLD CONGRESS ON ENGINEERING 2009, VOLS I AND II, 2009, : 1281 - 1286
  • [10] Nonparametric statistical inference and imputation for incomplete categorical data
    Wang, Chaojie
    Shen, Linghao
    Li, Han
    Fan, Xiaodan
    [J]. STATISTICS AND ITS INTERFACE, 2020, 13 (01) : 17 - 25