Recursive partitioning on incomplete data using surrogate decisions and multiple imputation

被引:36
|
作者
Hapfelmeier, A. [1 ]
Hothorn, T. [2 ]
Ulm, K. [1 ]
机构
[1] Tech Univ Munich, Inst Med Stat & Epidemiol, D-81675 Munich, Germany
[2] Univ Munich, Inst Stat, D-80539 Munich, Germany
关键词
Recursive partitioning; Classification and regression trees; Random Forests; Multiple imputation; MICE; Surrogates; VARIABLE IMPORTANCE MEASURES; REGRESSION TREES; MISSING DATA; CLASSIFICATION; PREDICTION; INFERENCE; DISCRETE; VALUES;
D O I
10.1016/j.csda.2011.09.024
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The occurrence of missing data is a major problem in statistical data analysis. All scientific fields and data of all kinds and size are touched by this problem. There is a number of ad-hoc solutions which unfortunately lead to a loss of power, biased inference, underestimation of variability and distorted relationships between variables. A more promising approach of rising popularity is multiple imputation by chained equations (MICE) also known as imputation by full conditional specification (FCS). Alternatives to imputation are given by methods with built-in procedures. These include recursive partitioning by classification and regression trees as well as corresponding Random Forests. However there is only few literature comparing the two approaches. Existing evaluations often lack generalizability due to restrictions on data structure and simulation schemes. The application of both methods to several kinds of data and different simulation settings is meant to improve and extend the comparative analyses. Classification and regression studies are examined. Recursive partitioning is executed by two popular tree and one Random Forest implementation. Findings show that multiple imputation produces ambiguous performance results for both, simulated and real life data. Using surrogates instead is a fast and simple way to achieve performances which are only negligible worse and in many cases even superior. (C) 2012 Published by Elsevier B.V.
引用
收藏
页码:1552 / 1565
页数:14
相关论文
共 50 条
  • [21] Multiple Imputation by Generative Adversarial Networks for Classification with Incomplete Data
    Bao Ngoc Vi
    Dinh Tan Nguyen
    Cao Truong Tran
    Huu Phuc Ngo
    Chi Cong Nguyen
    Hai-Hong Phan
    [J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 162 - 167
  • [22] A comparison of multiple imputation methods for incomplete longitudinal binary data
    Yamaguchi, Yusuke
    Misumi, Toshihiro
    Maruo, Kazushi
    [J]. JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2018, 28 (04) : 645 - 667
  • [23] Multiple imputation of incomplete zero-inflated count data
    Kleinke, Kristian
    Reinecke, Jost
    [J]. STATISTICA NEERLANDICA, 2013, 67 (03) : 311 - 336
  • [24] Handling Incomplete Data Using Evolution of Imputation Methods
    Zawistowski, Pawel
    Grzenda, Maciej
    [J]. ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, 2009, 5495 : 22 - +
  • [25] Recursive Partitioning for Personalization using Observational Data
    Kallus, Nathan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [26] Imputation Methods for Incomplete Data
    Umathe, Vaishali H.
    Chaudhary, Gauri
    [J]. 2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [27] Cost-effectiveness in clinical trials: using multiple imputation to deal with incomplete cost data
    Burton, Andrea
    Billingham, Lucinda Jane
    Bryan, Stirling
    [J]. CLINICAL TRIALS, 2007, 4 (02) : 154 - 161
  • [28] Difference Between Binomial Proportions Using Newcombe's Method With Multiple Imputation for Incomplete Data
    Sidi, Yulia
    Harel, Ofer
    [J]. AMERICAN STATISTICIAN, 2022, 76 (01): : 29 - 36
  • [29] Multiple imputation combined with bootstrapping for analysing incomplete cost and effect data
    Heymans, M. W.
    De Bruyne, M. C.
    Van Buuren, S.
    [J]. EUROPEAN JOURNAL OF EPIDEMIOLOGY, 2006, 21 : 57 - 57
  • [30] Multiple imputation confidence intervals for the mean of the discrete distributions for incomplete data
    Lee, Chung-Han
    Wang, Hsiuying
    [J]. STATISTICS IN MEDICINE, 2022, 41 (07) : 1172 - 1190