Recursive partitioning on incomplete data using surrogate decisions and multiple imputation

被引:36
|
作者
Hapfelmeier, A. [1 ]
Hothorn, T. [2 ]
Ulm, K. [1 ]
机构
[1] Tech Univ Munich, Inst Med Stat & Epidemiol, D-81675 Munich, Germany
[2] Univ Munich, Inst Stat, D-80539 Munich, Germany
关键词
Recursive partitioning; Classification and regression trees; Random Forests; Multiple imputation; MICE; Surrogates; VARIABLE IMPORTANCE MEASURES; REGRESSION TREES; MISSING DATA; CLASSIFICATION; PREDICTION; INFERENCE; DISCRETE; VALUES;
D O I
10.1016/j.csda.2011.09.024
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The occurrence of missing data is a major problem in statistical data analysis. All scientific fields and data of all kinds and size are touched by this problem. There is a number of ad-hoc solutions which unfortunately lead to a loss of power, biased inference, underestimation of variability and distorted relationships between variables. A more promising approach of rising popularity is multiple imputation by chained equations (MICE) also known as imputation by full conditional specification (FCS). Alternatives to imputation are given by methods with built-in procedures. These include recursive partitioning by classification and regression trees as well as corresponding Random Forests. However there is only few literature comparing the two approaches. Existing evaluations often lack generalizability due to restrictions on data structure and simulation schemes. The application of both methods to several kinds of data and different simulation settings is meant to improve and extend the comparative analyses. Classification and regression studies are examined. Recursive partitioning is executed by two popular tree and one Random Forest implementation. Findings show that multiple imputation produces ambiguous performance results for both, simulated and real life data. Using surrogates instead is a fast and simple way to achieve performances which are only negligible worse and in many cases even superior. (C) 2012 Published by Elsevier B.V.
引用
收藏
页码:1552 / 1565
页数:14
相关论文
共 50 条
  • [31] Multiple imputation and analysis for high-dimensional incomplete proteomics data
    Yin, Xiaoyan
    Levy, Daniel
    Willinger, Christine
    Adourian, Aram
    Larson, Martin G.
    [J]. STATISTICS IN MEDICINE, 2016, 35 (08) : 1315 - 1326
  • [32] Analyzing incomplete political science data: An alternative algorithm for multiple imputation
    King, G
    Honaker, J
    Joseph, A
    Scheve, K
    [J]. AMERICAN POLITICAL SCIENCE REVIEW, 2001, 95 (01) : 49 - 69
  • [33] Multiple imputation for an incomplete covariate that is a ratio
    Morris, Tim P.
    White, Ian R.
    Royston, Patrick
    Seaman, Shaun R.
    Wood, Angela M.
    [J]. STATISTICS IN MEDICINE, 2014, 33 (01) : 88 - 104
  • [34] Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation
    Edwards, Julianne M.
    Finch, W. Holmes
    [J]. PSICOLOGICA, 2018, 39 (01): : 88 - 117
  • [35] RECURSIVE PARAMETER-ESTIMATION USING INCOMPLETE DATA
    TITTERINGTON, DM
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1984, 46 (02): : 257 - 267
  • [36] Correspondence Analysis with Incomplete Paired Data using Bayesian Imputation
    de Tibeiro, Jules J. S.
    Murdoch, Duncan J.
    [J]. BAYESIAN ANALYSIS, 2010, 5 (03): : 519 - 532
  • [37] Imputation of incomplete data using adaptive ellipsoids with linear regression
    Yao, Leehter
    Weng, Kuei-Sung
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 29 (01) : 253 - 265
  • [38] Imputation of Incomplete Motion Data Using Hidden Markov Models
    Uvarov, V. E.
    Popov, A. A.
    Gultyaeva, T. A.
    [J]. XII INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE APPLIED MECHANICS AND SYSTEMS DYNAMICS, 2019, 1210
  • [39] The proportion of missing data should not be used to guide decisions on multiple imputation
    Madley-Dowd, Paul
    Hughes, Rachael
    Tilling, Kate
    Heron, Jon
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 : 63 - 73
  • [40] A Hybrid Method for Incomplete Data Imputation
    Zhao, Liang
    Chen, Zhikui
    Yang, Zhennan
    Hu, Yueming
    [J]. 2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 1725 - 1730