Imputation and variable selection in linear regression models with missing covariates

被引:48
|
作者
Yang, XW [1 ]
Belin, TR [1 ]
Boscardin, WJ [1 ]
机构
[1] Univ Calif Los Angeles, Dept Biostat, Los Angeles, CA 90095 USA
关键词
Bayesian variable selection; MCMC; model averaging; multiple imputation;
D O I
10.1111/j.1541-0420.2005.00317.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.
引用
收藏
页码:498 / 506
页数:9
相关论文
共 50 条
  • [1] Variable selection for linear regression models with random covariates
    Nkiet, GM
    [J]. COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE I-MATHEMATIQUE, 2001, 333 (12): : 1105 - 1110
  • [2] Variable selection for additive partial linear quantile regression with missing covariates
    Sherwood, Ben
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 152 : 206 - 223
  • [3] Bayesian variable selection for the Cox regression model with missing covariates
    Joseph G. Ibrahim
    Ming-Hui Chen
    Sungduk Kim
    [J]. Lifetime Data Analysis, 2008, 14 : 496 - 520
  • [4] Variable Selection in the Cox Regression Model with Covariates Missing at Random
    Garcia, Ramon I.
    Ibrahim, Joseph G.
    Zhu, Hongtu
    [J]. BIOMETRICS, 2010, 66 (01) : 97 - 104
  • [5] Bayesian variable selection for the Cox regression model with missing covariates
    Ibrahim, Joseph G.
    Chen, Ming-Hui
    Kim, Sungduk
    [J]. LIFETIME DATA ANALYSIS, 2008, 14 (04) : 496 - 520
  • [6] A robust imputation method for missing responses and covariates in sample selection models
    Ogundimu, Emmanuel O.
    Collins, Gary S.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (01) : 102 - 116
  • [7] Variable selection with missing data in both covariates and outcomes: Imputation and machine learning
    Hu, Liangyuan
    Lin, Jung-Yi Joyce
    Ji, Jiayi
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (12) : 2651 - 2671
  • [8] VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA
    Garcia, Ramon I.
    Ibrahim, Joseph G.
    Zhu, Hongtu
    [J]. STATISTICA SINICA, 2010, 20 (01) : 149 - 165
  • [9] Evaluating model-based imputation methods for missing covariates in regression models with interactions
    Kim, Soeun
    Sugar, Catherine A.
    Belin, Thomas R.
    [J]. STATISTICS IN MEDICINE, 2015, 34 (11) : 1876 - 1888
  • [10] Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates
    Ryo Kato
    Takahiro Hoshino
    [J]. Annals of the Institute of Statistical Mathematics, 2020, 72 : 803 - 825