A Comparison of Random Forest-Based Missing Imputation Methods for Covariates in Propensity Score Analysis

被引:0
|
作者
Lee, Yongseok [1 ]
Leite, Walter L. [2 ]
机构
[1] Univ Florida, Bur Econ & Business Res, 720 Southwest Second Ave Suite 150, Gainesville, FL 32611 USA
[2] Univ Florida, Sch Human Dev & Org Studies Educ, Gainesville, FL 32611 USA
关键词
propensity score analysis; missing data; multivariate imputation by chained equations; machine learning; random forests; MULTIPLE IMPUTATION; CHAINED EQUATIONS; CAUSAL INFERENCE; SENSITIVITY-ANALYSIS; MATCHING METHODS; MODELS; ASSUMPTION; ROBUSTNESS; STATISTICS; VARIABLES;
D O I
10.1037/met0000676
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Propensity score analysis (PSA) is a prominent method to alleviate selection bias in observational studies, but missing data in covariates is prevalent and must be dealt with during propensity score estimation. Through Monte Carlo simulations, this study evaluates the use of imputation methods based on multiple random forests algorithms to handle missing data in covariates: multivariate imputation by chained equations-random forest (Caliber), proximity imputation (PI), and missForest. The results indicated that PI and missForest outperformed other methods with respect to bias of average treatment effect regardless of sample size and missing mechanisms. A demonstration of these five methods with PSA to evaluate the effect of participation in center-based care on children's reading ability is provided using data from the Early Childhood Longitudinal Study, Kindergarten Class of 2010-2011.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Authors’ Reply: A comparison of different methods to handle missing data in the context of propensity score analysis
    Jungyeon Choi
    Olaf M. Dekkers
    Saskia le Cessie
    European Journal of Epidemiology, 2020, 35 : 89 - 91
  • [22] Authors' Reply: A comparison of different methods to handle missing data in the context of propensity score analysis
    Choi, Jungyeon
    Dekkers, Olaf M.
    le Cessie, Saskia
    EUROPEAN JOURNAL OF EPIDEMIOLOGY, 2020, 35 (01) : 89 - 91
  • [23] Imputation of missing well log data by random forest and its uncertainty analysis
    Feng, Runhai
    Grana, Dario
    Balling, Niels
    COMPUTERS & GEOSCIENCES, 2021, 152
  • [24] ImputeSCOPA: a Fast, Random Forest-Based Phenotype Imputation Tool for Large-Scale Studies
    Kaakinen, M.
    Anasanti, M.
    Jarvelin, M-R
    Prokopenko, I
    HUMAN HEREDITY, 2020, 84 (4-5) : 212 - 212
  • [25] Multiple imputation of missing blood pressure covariates in survival analysis
    Van Buuren, S
    Boshuizen, HC
    Knook, DL
    STATISTICS IN MEDICINE, 1999, 18 (06) : 681 - 694
  • [26] Causal inference in the presence of missing data using a random forest-based matching algorithm
    Hillis, Tristan
    Guarcello, Maureen A.
    Levine, Richard A.
    Fan, Juanjuan
    STAT, 2021, 10 (01):
  • [27] Propensity Score Analysis With Missing Data
    Cham, Heining
    West, Stephen G.
    PSYCHOLOGICAL METHODS, 2016, 21 (03) : 427 - 445
  • [28] Imputation methods for quantile estimation under missing at random
    Yang, Shu
    Kim, Jae-Kwang
    Shin, Dong Wan
    STATISTICS AND ITS INTERFACE, 2013, 6 (03) : 369 - 377
  • [29] Missing Data Imputation Through the Use of the Random Forest Algorithm
    Pantanowitz, Adam
    Marwala, Tshilidzi
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, 2009, 61 : 53 - 62
  • [30] COMPARING PROPENSITY SCORE, PROPENSITY SCORE WITH COVARIATES AND GENETIC ALGORITHM METHODS FOR COVARIATE MATCHING IN OBSERVATIONAL STUDIES
    Claeys, C.
    Bakken, D. G.
    Wasserman, D.
    Spilman, J.
    VALUE IN HEALTH, 2014, 17 (03) : A200 - A200