Multiple Imputation for Missing Data via Sequential Regression Trees

被引:182
|
作者
Burgette, Lane F. [1 ]
Reiter, Jerome P. [1 ]
机构
[1] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
关键词
diagnostic check; imputation; missing data; pregnancy outcome; regression tree;
D O I
10.1093/aje/kwq260
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Multiple imputation is particularly well suited to deal with missing data in large epidemiologic studies, because typically these studies support a wide range of analyses by many data users. Some of these analyses may involve complex modeling, including interactions and nonlinear relations. Identifying such relations and encoding them in imputation models, for example, in the conditional regressions for multiple imputation via chained equations, can be daunting tasks with large numbers of categorical and continuous variables. The authors present a nonparametric approach for implementing multiple imputation via chained equations by using sequential regression trees as the conditional models. This has the potential to capture complex relations with minimal tuning by the data imputer. Using simulations, the authors demonstrate that the method can result in more plausible imputations, and hence more reliable inferences, in complex settings than the naive application of standard sequential regression imputation techniques. They apply the approach to impute missing values in data on adverse birth outcomes with more than 100 clinical and survey variables. They evaluate the imputations using posterior predictive checks with several epidemiologic analyses of interest.
引用
收藏
页码:1070 / 1076
页数:7
相关论文
共 50 条
  • [41] Missing Data and Multiple Imputation in Rheumatoid Arthritis Registries Using Sequential Random Forest Method
    Al-Saber, Ahmed
    Al-Herz, Adeeba
    Pan, Jiazhu
    Saleh, Khulood
    Al-Awadhi, Adel
    Al-Kandari, Waleed
    Hasan, Eman
    Ghanem, Aqeel
    Hussain, Mohammed
    Ali, Yaser
    Nahar, Ebrahim
    Alenizi, Ahmad
    Hayat, Sawsan
    Abutiban, Fatemah
    Aldei, Ali
    Alkadi, Amjad
    Alhajeri, Heba
    Behbehani, Husain
    Alhadhood, Naser
    Mokaddem, Khaled
    Khadrawy, Ahmed
    Fazal, Ammad
    Zaman, Agaz
    Mazloum, Ghada
    Bartella, Youssef
    Hamed, Sally
    Alsouk, Ramia
    [J]. ARTHRITIS & RHEUMATOLOGY, 2020, 72
  • [42] MISSING DATA AND MULTIPLE IMPUTATION IN RHEUMATOID ARTHRITIS REGISTRIES USING SEQUENTIAL RANDOM FOREST METHOD
    Alsaber, A.
    Al-Herz, A.
    Pan, J.
    Saleh, K.
    Al-Awadhi, A.
    Al-Kandari, W.
    Hasan, E.
    Ghanem, A.
    Hussain, M.
    Ali, Y.
    Nahar, E.
    Alenizi, A.
    Hayat, S.
    Abutiban, F.
    Aledei, A.
    Al-Qadhi, A.
    Alhajeri, H.
    Behbehani, H.
    Alhadhood, N.
    [J]. ANNALS OF THE RHEUMATIC DISEASES, 2020, 79 : 515 - 515
  • [43] Analysing Mark–Recapture–Recovery Data in the Presence of Missing Covariate Data Via Multiple Imputation
    Hannah Worthington
    Ruth King
    Stephen T. Buckland
    [J]. Journal of Agricultural, Biological, and Environmental Statistics, 2015, 20 : 28 - 46
  • [44] Handling missing data in trees: Surrogate splits or statistical imputation?
    Feelders, A
    [J]. PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 329 - 334
  • [45] Analysis of Longitudinal Clinical Trials with Missing Data Using Multiple Imputation in Conjunction with Robust Regression
    Mehrotra, Devan V.
    Li, Xiaoming
    Liu, Jiajun
    Lu, Kaifeng
    [J]. BIOMETRICS, 2012, 68 (04) : 1250 - 1259
  • [46] The performance of multiple imputation for missing covariate data within the context of regression relative survival analysis
    Giorgi, Roch
    Belot, Aurelien
    Gaudart, Jean
    Launoy, Guy
    [J]. STATISTICS IN MEDICINE, 2008, 27 (30) : 6310 - 6331
  • [47] Weighted multiple blockwise imputation method for high-dimensional regression with blockwise missing data
    Li, Jingmao
    Zhang, Qingzhao
    Chen, Song
    Fang, Kuangnan
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (03) : 459 - 474
  • [48] Evolving Regression Trees Robust to Missing Data
    Blomberg, Luciano C.
    Barros, Rodrigo C.
    Ruiz, Duncan D.
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 102 - 109
  • [49] Data Imputation for Symbolic Regression with Missing Values: A Comparative Study
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    [J]. 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2093 - 2100
  • [50] Regression-based imputation of explanatory discrete missing data
    Hernandez-Herrera, Gilma
    Navarro, Albert
    Morina, David
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 53 (09) : 4363 - 4379