A Comparison of Random Forest-Based Missing Imputation Methods for Covariates in Propensity Score Analysis

被引:0
|
作者
Lee, Yongseok [1 ]
Leite, Walter L. [2 ]
机构
[1] Univ Florida, Bur Econ & Business Res, 720 Southwest Second Ave Suite 150, Gainesville, FL 32611 USA
[2] Univ Florida, Sch Human Dev & Org Studies Educ, Gainesville, FL 32611 USA
关键词
propensity score analysis; missing data; multivariate imputation by chained equations; machine learning; random forests; MULTIPLE IMPUTATION; CHAINED EQUATIONS; CAUSAL INFERENCE; SENSITIVITY-ANALYSIS; MATCHING METHODS; MODELS; ASSUMPTION; ROBUSTNESS; STATISTICS; VARIABLES;
D O I
10.1037/met0000676
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Propensity score analysis (PSA) is a prominent method to alleviate selection bias in observational studies, but missing data in covariates is prevalent and must be dealt with during propensity score estimation. Through Monte Carlo simulations, this study evaluates the use of imputation methods based on multiple random forests algorithms to handle missing data in covariates: multivariate imputation by chained equations-random forest (Caliber), proximity imputation (PI), and missForest. The results indicated that PI and missForest outperformed other methods with respect to bias of average treatment effect regardless of sample size and missing mechanisms. A demonstration of these five methods with PSA to evaluate the effect of participation in center-based care on children's reading ability is provided using data from the Early Childhood Longitudinal Study, Kindergarten Class of 2010-2011.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Multiple imputation for propensity score analysis with covariates missing at random: some clarity on "within" and "across" methods
    Nguyen, Trang Quynh
    Stuart, Elizabeth A.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2024, 193 (10) : 1470 - 1476
  • [2] Propensity Score Analysis with Partially Observed Baseline Covariates: A Practical Comparison of Methods for Handling Missing Data
    Bottigliengo, Daniele
    Lorenzoni, Giulia
    Ocagli, Honoria
    Martinato, Matteo
    Berchialla, Paola
    Gregori, Dario
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (13)
  • [3] An imputation-based solution to using mismeasured covariates in propensity score analysis
    Webb-Vargas, Yenny
    Rudolph, Kara E.
    Lenis, David
    Murakami, Peter
    Stuart, Elizabeth A.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2017, 26 (04) : 1824 - 1837
  • [4] Propensity Score Analysis With Unreliable Covariates: A Comparison of Five Reliability-Adjustment Methods
    Zhang, Huibin
    Leite, Walter L.
    JOURNAL OF EXPERIMENTAL EDUCATION, 2024,
  • [5] Propensity score analysis with partially observed covariates: How should multiple imputation be used?
    Leyrat, Clemence
    Seaman, Shaun R.
    White, Ian R.
    Douglas, Ian
    Smeeth, Liam
    Kim, Joseph
    Resche-Rigon, Matthieu
    Carpenter, James R.
    Williamson, Elizabeth J.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (01) : 3 - 19
  • [6] A comparison of different methods to handle missing data in the context of propensity score analysis
    Jungyeon Choi
    Olaf M. Dekkers
    Saskia le Cessie
    European Journal of Epidemiology, 2019, 34 : 23 - 36
  • [7] A comparison of different methods to handle missing data in the context of propensity score analysis
    Choi, Jungyeon
    Dekkers, Olaf M.
    le Cessie, Saskia
    EUROPEAN JOURNAL OF EPIDEMIOLOGY, 2019, 34 (01) : 23 - 36
  • [8] Comparison of several imputation methods for missing baseline data in propensity scores analysis of binary outcome
    Crowe, Brenda J.
    Lipkovich, Ilya A.
    Wang, Ouhong
    PHARMACEUTICAL STATISTICS, 2010, 9 (04) : 269 - 279
  • [9] Comparing the performance of eight imputation methods for propensity score matching in missing data problem
    Omurlu, Imran Kurt
    Varol, Bugra
    Ture, Mevlut
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2023, 26 (04) : 915 - 927
  • [10] Covariates missing by design: Comparison of the efficient score to other weighted methods
    D'Angelo, Gina
    Weissfeld, Lisa
    STATISTICS IN MEDICINE, 2007, 26 (10) : 2137 - 2153