A Comparison of Random Forest-Based Missing Imputation Methods for Covariates in Propensity Score Analysis

被引:0
|
作者
Lee, Yongseok [1 ]
Leite, Walter L. [2 ]
机构
[1] Univ Florida, Bur Econ & Business Res, 720 Southwest Second Ave Suite 150, Gainesville, FL 32611 USA
[2] Univ Florida, Sch Human Dev & Org Studies Educ, Gainesville, FL 32611 USA
关键词
propensity score analysis; missing data; multivariate imputation by chained equations; machine learning; random forests; MULTIPLE IMPUTATION; CHAINED EQUATIONS; CAUSAL INFERENCE; SENSITIVITY-ANALYSIS; MATCHING METHODS; MODELS; ASSUMPTION; ROBUSTNESS; STATISTICS; VARIABLES;
D O I
10.1037/met0000676
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Propensity score analysis (PSA) is a prominent method to alleviate selection bias in observational studies, but missing data in covariates is prevalent and must be dealt with during propensity score estimation. Through Monte Carlo simulations, this study evaluates the use of imputation methods based on multiple random forests algorithms to handle missing data in covariates: multivariate imputation by chained equations-random forest (Caliber), proximity imputation (PI), and missForest. The results indicated that PI and missForest outperformed other methods with respect to bias of average treatment effect regardless of sample size and missing mechanisms. A demonstration of these five methods with PSA to evaluate the effect of participation in center-based care on children's reading ability is provided using data from the Early Childhood Longitudinal Study, Kindergarten Class of 2010-2011.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Imputation of missing longitudinal data: a comparison of methods
    Engels, JM
    Diehr, P
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (10) : 968 - 976
  • [32] Missing traffic data: comparison of imputation methods
    Li, Yuebiao
    Li, Zhiheng
    Li, Li
    IET INTELLIGENT TRANSPORT SYSTEMS, 2014, 8 (01) : 51 - 57
  • [33] Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
    Ou, Hongsen
    Yao, Yunan
    He, Yi
    SENSORS, 2024, 24 (04)
  • [34] Missing data analysis in cognitive diagnostic models: Random forest threshold imputation method
    You Xiaofeng
    Yang Jianqin
    Qin Chunying
    Liu Hongyun
    ACTA PSYCHOLOGICA SINICA, 2023, 55 (07) : 1192 - 1206
  • [35] When Data Goes Missing: Methods for Missing Score Imputation in Biometric Fusion
    Ding, Yaohui
    Ross, Arun
    BIOMETRIC TECHNOLOGY FOR HUMAN IDENTIFICATION VII, 2010, 7667
  • [36] Imputation of Missing Covariate Data Prior to Propensity Score Analysis: A Tutorial and Evaluation of the Robustness of Practical Approaches
    Leite, Walter L.
    Aydin, Burak
    Cetin-Berber, Dee D.
    EVALUATION REVIEW, 2021, 45 (1-2) : 34 - 69
  • [37] Multiple imputation analysis for propensity score matching with missing causes of failure: An application to hepatocellular carcinoma data
    Han, Seungbong
    Tsui, Kam-Wah
    Zhang, Hui
    Kim, Gi-Ae
    Lim, Young-Suk
    Andrei, Adin-Cristian
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (10) : 2313 - 2328
  • [38] Regression Analysis with Covariates Missing at Random: A Piece-wise Nonparametric Model for Missing Covariates
    Zhao, Yang
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2009, 38 (20) : 3736 - 3744
  • [39] Cox regression analysis with missing covariates via nonparametric multiple imputation
    Hsu, Chiu-Hsieh
    Yu, Mandi
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (06) : 1676 - 1688
  • [40] The Comparative Performance of Logistic Regression and Random Forest in Propensity Score Methods: a Simulation Study
    Ali, M. Sanni
    Khalid, Sara
    Collins, Gary S.
    Prieto-Alhambra, Daniel
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2017, 26 : 489 - 489