Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

被引:65
|
作者
Eekhout, Iris [1 ,2 ,3 ]
van de Wiel, Mark A. [1 ,4 ]
Heymans, Martijn W. [1 ,2 ]
机构
[1] Vrije Univ Amsterdam, Med Ctr, Dept Epidemiol & Biostat, Amsterdam, Netherlands
[2] Vrije Univ Amsterdam, Med Ctr, Amsterdam Publ Hlth Res Inst, Amsterdam, Netherlands
[3] Netherlands Org Appl Sci TNO, Dept Child Hlth, Schipholweg 77-89, NL-2316 ZL Leiden, Netherlands
[4] Vrije Univ Amsterdam, Dept Math, Amsterdam, Netherlands
来源
关键词
Multiple imputation; Pooling; Categorical covariates; Significance test; Logistic regression; Simulation study; IMPUTED DATA; MISSING DATA; VALUES;
D O I
10.1186/s12874-017-0404-7
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin's Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chi-square tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the p-values from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power. Methods: In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables. Results: This simulation study showed that for non-significant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results. Conclusions: It can therefore be concluded that using the median of the p-values from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
    Iris Eekhout
    Mark A. van de Wiel
    Martijn W. Heymans
    [J]. BMC Medical Research Methodology, 17
  • [2] Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods
    Lee, Shen-Ming
    Le, Truong-Nhat
    Tran, Phuoc-Loc
    Li, Chin-Shang
    [J]. COMPUTATIONAL STATISTICS, 2023, 38 (02) : 899 - 934
  • [3] Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods
    Shen-Ming Lee
    Truong-Nhat Le
    Phuoc-Loc Tran
    Chin-Shang Li
    [J]. Computational Statistics, 2023, 38 : 899 - 934
  • [4] Population-calibrated multiple imputation for a binary/categorical covariate in categorical regression models
    Tra My Pham
    Carpenter, James R.
    Morris, Tim P.
    Wood, Angela M.
    Petersen, Irene
    [J]. STATISTICS IN MEDICINE, 2019, 38 (05) : 792 - 808
  • [5] Cox regression analysis with missing covariates via nonparametric multiple imputation
    Hsu, Chiu-Hsieh
    Yu, Mandi
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (06) : 1676 - 1688
  • [6] Testing the significance of categorical predictor variables in nonparametric regression models
    Racine, Jeffery S.
    Hart, Jeffrey
    Li, Qi
    [J]. ECONOMETRIC REVIEWS, 2006, 25 (04) : 523 - 544
  • [7] Evaluating model-based imputation methods for missing covariates in regression models with interactions
    Kim, Soeun
    Sugar, Catherine A.
    Belin, Thomas R.
    [J]. STATISTICS IN MEDICINE, 2015, 34 (11) : 1876 - 1888
  • [8] Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates
    Ryo Kato
    Takahiro Hoshino
    [J]. Annals of the Institute of Statistical Mathematics, 2020, 72 : 803 - 825
  • [9] Multiple imputation of a randomly censored covariate improves logistic regression analysis
    Atem, Folefac D.
    Qian, Jing
    Maye, Jacqueline E.
    Johnson, Keith A.
    Betensky, Rebecca A.
    [J]. JOURNAL OF APPLIED STATISTICS, 2016, 43 (15) : 2886 - 2896
  • [10] Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous-discrete covariates
    Kato, Ryo
    Hoshino, Takahiro
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2020, 72 (03) : 803 - 825