Efficient and doubly robust imputation for covariate-dependent missing responses

被引:52
|
作者
Qin, Jing [1 ]
Shao, Jun [2 ]
Zhang, Biao [3 ]
机构
[1] NIAID, Biostat Res Branch, NIH, Bethesda, MD 20892 USA
[2] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[3] Univ Toledo, Dept Math, Toledo, OH 43606 USA
基金
美国国家科学基金会;
关键词
covariate-dependent missing mechanism; doubly robust; imputation; local efficiency; model-assisted;
D O I
10.1198/016214508000000238
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article we study a well-known response missing-data problem. Missing data is an ubiquitous problem in medical and social science studies. Imputation is one of the most popular methods for dealing with missing data. The most commonly used imputation that makes use of covariates is regression imputation, in which the regression model can be parametric, semiparametric, or nonparametric. Parametric regression imputation is efficient but is not robust against misspecification of the regression model. Although nonparametric regression imputation (such as nearest-neighbor imputation and kernel regression imputation) is model-free, it is not efficient, especially if the dimension of covariate vector is high (the well-known problem of curse of dimensionality). Semiparametric regression imputation (such as partially linear regression imputation) can reduce the dimension of the covariate in nonparametric regression fitting but is not robust against misspecification of the linear component in the regression. Assuming that the missing mechanism is covariate-dependent and that the propensity function can be specified correctly, we propose a regression imputation method that has good efficiency and is robust against regression model misspecification. Furthermore, our method is valid as long as either the regression model or the propensity model is correct, a property known as the double-robustness property. We show that asymptotically the sample mean based on our imputation achieves the semiparametric efficient lower bound if both regression and propensity models are specified correctly. Our simulation results demonstrate that the proposed method outperforms many existing methods for handling missing data, especially when the regression model is misspecified. As an illustration, an economic observational data set is analyzed.
引用
收藏
页码:797 / 810
页数:14
相关论文
共 50 条
  • [1] Efficient and doubly robust estimation in covariate-missing data problems
    Zhang, Biao
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2015, 18 (03): : 213 - 250
  • [2] Covariate-free and Covariate-dependent Reliability
    Peter M. Bentler
    Psychometrika, 2016, 81 : 907 - 920
  • [3] Covariate-free and Covariate-dependent Reliability
    Bentler, Peter M.
    PSYCHOMETRIKA, 2016, 81 (04) : 907 - 920
  • [4] Missing binary outcomes under covariate-dependent missingness in cluster randomised trials
    Hossain, Anower
    DiazOrdaz, Karla
    Bartlett, Jonathan W.
    STATISTICS IN MEDICINE, 2017, 36 (19) : 3092 - 3109
  • [5] Properties and pitfalls of weighting as an alternative to multilevel multiple imputation in cluster randomized trials with missing binary outcomes under covariate-dependent missingness
    Turner, Elizabeth L.
    Yao, Lanqiu
    Li, Fan
    Prague, Melanie
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2020, 29 (05) : 1338 - 1353
  • [6] DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA
    Long, Qi
    Hsu, Chiu-Hsieh
    Li, Yisheng
    STATISTICA SINICA, 2012, 22 (01) : 149 - 172
  • [7] COVARIATE-DEPENDENT DICTIONARY LEARNING AND SPARSE CODING
    Zhou, Mingyuan
    Yang, Hongxia
    Sapiro, Guillermo
    Dunson, David
    Carin, Lawrence
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5824 - 5827
  • [8] A comparison of multiple imputation and doubly robust estimation for analyses with missing data
    Carpenter, James R.
    Kenward, Michael G.
    Vansteelandt, Stijn
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2006, 169 : 571 - 584
  • [9] Doubly robust empirical likelihood inference in covariate-missing data problems
    Zhang, Biao
    STATISTICS, 2016, 50 (05) : 1173 - 1194
  • [10] Bayesian semiparametric estimation of covariate-dependent ROC curves
    Rodriguez, Abel
    Martinez, Julissa C.
    BIOSTATISTICS, 2014, 15 (02) : 353 - 369