Multiple Imputation for Missing Data via Sequential Regression Trees

被引:182
|
作者
Burgette, Lane F. [1 ]
Reiter, Jerome P. [1 ]
机构
[1] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
关键词
diagnostic check; imputation; missing data; pregnancy outcome; regression tree;
D O I
10.1093/aje/kwq260
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regressions for multiple imputation via chained equations, can be daunting tasks with large numbers of categorical and continuous variables. The authors present a nonparametric approach for implementing multiple imputation via chained equations by using sequential regression trees as the conditional models. This has the potential to capture complex relations with minimal tuning by the data imputer. Using simulations, the authors demonstrate that the method can result in more plausible imputations, and hence more reliable inferences, in complex settings than the naive application of standard sequential regression imputation techniques. They apply the approach to impute missing values in data on adverse birth outcomes with more than 100 clinical and survey variables. They evaluate the imputations using posterior predictive checks with several epidemiologic analyses of interest.
引用
收藏
页码:1070 / 1076
页数:7
相关论文
共 50 条
  • [1] MISSING DATA, IMPUTATION AND REGRESSION TREES
    Loh, Wei-Yin
    Zhang, Qiong
    Zhang, Wenwen
    Zhou, Peigen
    [J]. STATISTICA SINICA, 2020, 30 (04) : 1697 - 1722
  • [2] Missing data imputation using classification and regression trees
    Chen, Cheng-Yang
    Chang, Yu-Wei
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [3] Regression multiple imputation for missing data analysis
    Yu, Lili
    Liu, Liang
    Peace, Karl E.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2020, 29 (09) : 2647 - 2664
  • [4] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
    Asif, Muhammad
    Samart, Klairung
    [J]. THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
  • [5] AN APPLICATION OF SEQUENTIAL REGRESSION MULTIPLE IMPUTATION ON PANEL DATA
    Von Maltitz, Michael Johan
    Van der Merwe, Abraham Johannes
    [J]. SOUTH AFRICAN JOURNAL OF ECONOMICS, 2012, 80 (01) : 77 - 90
  • [6] Application of Sequential Regression Multivariate Imputation Method on Multivariate Normal Missing Data
    Nurzaman
    Siswantining, Titin
    Soemartojo, Saskya Mary
    Sarwinda, Devvi
    [J]. 2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
  • [7] Cox regression analysis with missing covariates via nonparametric multiple imputation
    Hsu, Chiu-Hsieh
    Yu, Mandi
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (06) : 1676 - 1688
  • [8] Missing Data and Multiple Imputation
    Cummings, Peter
    [J]. JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [9] Multiple imputation with sequential penalized regression
    Zahid, Faisal M.
    Heumann, Christian
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (05) : 1311 - 1327
  • [10] Multiple imputation for missing data
    Patrician, PA
    [J]. RESEARCH IN NURSING & HEALTH, 2002, 25 (01) : 76 - 84