Multiple Imputation for Missing Data via Sequential Regression Trees

被引:182
|
作者
Burgette, Lane F. [1 ]
Reiter, Jerome P. [1 ]
机构
[1] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
关键词
diagnostic check; imputation; missing data; pregnancy outcome; regression tree;
D O I
10.1093/aje/kwq260
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regressions for multiple imputation via chained equations, can be daunting tasks with large numbers of categorical and continuous variables. The authors present a nonparametric approach for implementing multiple imputation via chained equations by using sequential regression trees as the conditional models. This has the potential to capture complex relations with minimal tuning by the data imputer. Using simulations, the authors demonstrate that the method can result in more plausible imputations, and hence more reliable inferences, in complex settings than the naive application of standard sequential regression imputation techniques. They apply the approach to impute missing values in data on adverse birth outcomes with more than 100 clinical and survey variables. They evaluate the imputations using posterior predictive checks with several epidemiologic analyses of interest.
引用
收藏
页码:1070 / 1076
页数:7
相关论文
共 50 条
  • [21] MULTIPLE IMPUTATION AS A MISSING DATA MACHINE
    BRAND, J
    VANBUUREN, S
    VANMULLIGEN, EM
    TIMMERS, T
    GELSEMA, E
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, : 303 - 306
  • [22] Multiple imputation for nonignorable missing data
    Im, Jongho
    Kim, Soeun
    [J]. JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2017, 46 (04) : 583 - 592
  • [23] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    [J]. NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [24] Multiple imputation for nonignorable missing data
    Jongho Im
    Soeun Kim
    [J]. Journal of the Korean Statistical Society, 2017, 46 : 583 - 592
  • [25] Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods
    Lee, Shen-Ming
    Le, Truong-Nhat
    Tran, Phuoc-Loc
    Li, Chin-Shang
    [J]. COMPUTATIONAL STATISTICS, 2023, 38 (02) : 899 - 934
  • [26] Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods
    Shen-Ming Lee
    Truong-Nhat Le
    Phuoc-Loc Tran
    Chin-Shang Li
    [J]. Computational Statistics, 2023, 38 : 899 - 934
  • [27] Missing Value Imputation via Clusterwise Linear Regression
    Karmitsa, Napsu
    Taheri, Sona
    Bagirov, Adil
    Makinen, Pauliina
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (04) : 1889 - 1901
  • [28] Wind power prediction with missing data using Gaussian process regression and multiple imputation
    Liu, Tianhong
    Wei, Haikun
    Zhang, Kanjian
    [J]. APPLIED SOFT COMPUTING, 2018, 71 : 905 - 916
  • [29] Large sample results for frequentist multiple imputation for Cox regression with missing covariate data
    Eriksson, Frank
    Martinussen, Torben
    Nielsen, Soren Feodor
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2020, 72 (04) : 969 - 996
  • [30] Large sample results for frequentist multiple imputation for Cox regression with missing covariate data
    Frank Eriksson
    Torben Martinussen
    Søren Feodor Nielsen
    [J]. Annals of the Institute of Statistical Mathematics, 2020, 72 : 969 - 996