Recursive partitioning on incomplete data using surrogate decisions and multiple imputation

被引:36
|
作者
Hapfelmeier, A. [1 ]
Hothorn, T. [2 ]
Ulm, K. [1 ]
机构
[1] Tech Univ Munich, Inst Med Stat & Epidemiol, D-81675 Munich, Germany
[2] Univ Munich, Inst Stat, D-80539 Munich, Germany
关键词
Recursive partitioning; Classification and regression trees; Random Forests; Multiple imputation; MICE; Surrogates; VARIABLE IMPORTANCE MEASURES; REGRESSION TREES; MISSING DATA; CLASSIFICATION; PREDICTION; INFERENCE; DISCRETE; VALUES;
D O I
10.1016/j.csda.2011.09.024
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The occurrence of missing data is a major problem in statistical data analysis. All scientific fields and data of all kinds and size are touched by this problem. There is a number of ad-hoc solutions which unfortunately lead to a loss of power, biased inference, underestimation of variability and distorted relationships between variables. A more promising approach of rising popularity is multiple imputation by chained equations (MICE) also known as imputation by full conditional specification (FCS). Alternatives to imputation are given by methods with built-in procedures. These include recursive partitioning by classification and regression trees as well as corresponding Random Forests. However there is only few literature comparing the two approaches. Existing evaluations often lack generalizability due to restrictions on data structure and simulation schemes. The application of both methods to several kinds of data and different simulation settings is meant to improve and extend the comparative analyses. Classification and regression studies are examined. Recursive partitioning is executed by two popular tree and one Random Forest implementation. Findings show that multiple imputation produces ambiguous performance results for both, simulated and real life data. Using surrogates instead is a fast and simple way to achieve performances which are only negligible worse and in many cases even superior. (C) 2012 Published by Elsevier B.V.
引用
收藏
页码:1552 / 1565
页数:14
相关论文
共 50 条
  • [1] Tree-based prediction on incomplete data using imputation or surrogate decisions
    Valdiviezo, H. Cevallos
    Van Aelst, S.
    [J]. INFORMATION SCIENCES, 2015, 311 : 163 - 181
  • [2] On using multiple imputation for exploratory factor analysis of incomplete data
    Nassiri, Vahid
    Lovik, Aniko
    Molenberghs, Geert
    Verbeke, Geert
    [J]. BEHAVIOR RESEARCH METHODS, 2018, 50 (02) : 501 - 517
  • [3] On using multiple imputation for exploratory factor analysis of incomplete data
    Vahid Nassiri
    Anikó Lovik
    Geert Molenberghs
    Geert Verbeke
    [J]. Behavior Research Methods, 2018, 50 : 501 - 517
  • [4] Using multiple imputation for analysis of incomplete data in clinical research
    McCleary, L
    [J]. NURSING RESEARCH, 2002, 51 (05) : 339 - 343
  • [5] Analysis of incomplete longitudinal binary data using multiple imputation
    Li, Xiaoming
    Mehrotra, Devan V.
    Barnard, John
    [J]. STATISTICS IN MEDICINE, 2006, 25 (12) : 2107 - 2124
  • [6] Recursive partitioning for missing data imputation in the presence of interaction effects
    Doove, L. L.
    Van Buuren, S.
    Dusseldorp, E.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 72 : 92 - 104
  • [7] Multiple imputation for incomplete data with semicontinuous variables
    Javaras, KN
    Van Dyk, DA
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (463) : 703 - 715
  • [8] A multiple imputation strategy for incomplete longitudinal data
    Landrum, MB
    Becker, MP
    [J]. STATISTICS IN MEDICINE, 2001, 20 (17-18) : 2741 - 2760
  • [9] Multiple Imputation for Incomplete Data in Epidemiologic Studies
    Harel, Ofer
    Mitchell, Emily M.
    Perkins, Neil J.
    Cole, Stephen R.
    Tchetgen, Eric J. Tchetgen
    Sun, BaoLuo
    Schisterman, Enrique F.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2018, 187 (03) : 576 - 584
  • [10] Multiple imputation of incomplete multilevel data using Heckman selection models
    Munoz, Johanna
    Efthimiou, Orestis
    Audigier, Vincent
    de Jong, Valentijn M. T.
    Debray, Thomas P. A.
    [J]. STATISTICS IN MEDICINE, 2024, 43 (03) : 514 - 533