Recursive partitioning on incomplete data using surrogate decisions and multiple imputation

被引:36
|
作者
Hapfelmeier, A. [1 ]
Hothorn, T. [2 ]
Ulm, K. [1 ]
机构
[1] Tech Univ Munich, Inst Med Stat & Epidemiol, D-81675 Munich, Germany
[2] Univ Munich, Inst Stat, D-80539 Munich, Germany
关键词
Recursive partitioning; Classification and regression trees; Random Forests; Multiple imputation; MICE; Surrogates; VARIABLE IMPORTANCE MEASURES; REGRESSION TREES; MISSING DATA; CLASSIFICATION; PREDICTION; INFERENCE; DISCRETE; VALUES;
D O I
10.1016/j.csda.2011.09.024
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The occurrence of missing data is a major problem in statistical data analysis. All scientific fields and data of all kinds and size are touched by this problem. There is a number of ad-hoc solutions which unfortunately lead to a loss of power, biased inference, underestimation of variability and distorted relationships between variables. A more promising approach of rising popularity is multiple imputation by chained equations (MICE) also known as imputation by full conditional specification (FCS). Alternatives to imputation are given by methods with built-in procedures. These include recursive partitioning by classification and regression trees as well as corresponding Random Forests. However there is only few literature comparing the two approaches. Existing evaluations often lack generalizability due to restrictions on data structure and simulation schemes. The application of both methods to several kinds of data and different simulation settings is meant to improve and extend the comparative analyses. Classification and regression studies are examined. Recursive partitioning is executed by two popular tree and one Random Forest implementation. Findings show that multiple imputation produces ambiguous performance results for both, simulated and real life data. Using surrogates instead is a fast and simple way to achieve performances which are only negligible worse and in many cases even superior. (C) 2012 Published by Elsevier B.V.
引用
收藏
页码:1552 / 1565
页数:14
相关论文
共 50 条
  • [41] Special issue: Incomplete data: multiple imputation and model-based analysis
    van Buuren, S
    Eisinga, R
    [J]. STATISTICA NEERLANDICA, 2003, 57 (01) : 1 - 2
  • [42] Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
    Wahl, Simone
    Boulesteix, Anne-Laure
    Zierer, Astrid
    Thorand, Barbara
    de Wiel, Mark Avan
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2016, 16 : 1 - 18
  • [43] Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
    Simone Wahl
    Anne-Laure Boulesteix
    Astrid Zierer
    Barbara Thorand
    Mark A. van de Wiel
    [J]. BMC Medical Research Methodology, 16
  • [44] Multiple imputation for high-dimensional mixed incomplete continuous and binary data
    He, Ren
    Belin, Thomas
    [J]. STATISTICS IN MEDICINE, 2014, 33 (13) : 2251 - 2262
  • [45] A Simulation Study Comparing Multiple Imputation Methods for Incomplete Longitudinal Ordinal Data
    Donneau, A. F.
    Mauer, M.
    Molenberghs, G.
    Albert, A.
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2015, 44 (05) : 1311 - 1338
  • [46] Multiple imputation for longitudinal data using Bayesian lasso imputation model
    Yamaguchi, Yusuke
    Yoshida, Satoshi
    Misumi, Toshihiro
    Maruo, Kazushi
    [J]. STATISTICS IN MEDICINE, 2022, 41 (06) : 1042 - 1058
  • [47] Discriminant analysis and factorial multiple splits in recursive partitioning for data mining
    Mola, F
    Siciliano, R
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2002, 2364 : 118 - 126
  • [48] Incomplete clustering analysis via multiple imputation
    Lee, Jung Wun
    Harel, Ofer
    [J]. JOURNAL OF APPLIED STATISTICS, 2023, 50 (09) : 1962 - 1979
  • [49] Multiple imputation for the analysis of incomplete compound variables
    Zhao, Jiwei
    Cook, Richard J.
    Wu, Changbao
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2015, 43 (02): : 240 - 264
  • [50] A Classification Method for Incomplete Mixed Data Using Imputation and Feature Selection
    Li, Gengsong
    Zheng, Qibin
    Liu, Yi
    Li, Xiang
    Qin, Wei
    Diao, Xingchun
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (14):