Tree-based prediction on incomplete data using imputation or surrogate decisions

被引:50
|
作者
Valdiviezo, H. Cevallos [1 ]
Van Aelst, S. [1 ,2 ]
机构
[1] Univ Ghent, Dept Appl Math Comp Sci & Stat, B-9000 Ghent, Belgium
[2] Katholieke Univ Leuven, Dept Math, Sect Stat, B-3001 Louvain, Belgium
关键词
Prediction; Missing data; Surrogate decision; Multiple imputation; Conditional inference tree; MULTIPLE IMPUTATION; MISSING DATA; MICE;
D O I
10.1016/j.ins.2015.03.018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The goal is to investigate the prediction performance of tree-based techniques when the available training data contains features with missing values. Also the future test cases may contain missing values and thus the methods should be able to generate predictions for such test cases. The missing values are handled either by using surrogate decisions within the trees or by the combination of an imputation method with a tree-based method. Missing values generated according to missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) mechanisms are considered with various fractions of missing data. Imputation models are built in the learning phase and do not make use of the response variable, so that the resulting procedures allow to predict individual incomplete test cases. In the empirical comparison, both classification and regression problems are considered using a simulated and real-life datasets. The performance is evaluated by misclassification rate of predictions and mean squared prediction error, respectively. Overall, our results show that for smaller fractions of missing data an ensemble method combined with surrogates or single imputation suffices. For moderate to large fractions of missing values ensemble methods based on conditional inference trees combined with multiple imputation show the best performance, while conditional bagging using surrogates is a good alternative for high-dimensional prediction problems. Theoretical results confirm the potential better prediction performance of multiple imputation ensembles. (c) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:163 / 181
页数:19
相关论文
共 50 条
  • [41] A Tree-Based Approach to Data Flow Proofs
    Hoenicke, Jochen
    Nutz, Alexander
    Podelski, Andreas
    VERIFIED SOFTWARE: THEORIES, TOOLS, AND EXPERIMENTS, (VSTTE 2018), 2018, 11294 : 1 - 16
  • [42] Tree-Based Models for Political Science Data
    Montgomery, Jacob M.
    Olivella, Santiago
    AMERICAN JOURNAL OF POLITICAL SCIENCE, 2018, 62 (03) : 729 - 744
  • [43] A study of tree-based control flow prediction schemes
    Cyril, B
    Franklin, M
    FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 28 - 33
  • [44] XGBoost: a tree-based approach for traffic volume prediction
    Lartey, Benjamin
    Homaifar, Abdollah
    Girma, Abenezer
    Karimoddini, Ali
    Opoku, Daniel
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1280 - 1286
  • [45] Protein pKa Prediction by Tree-Based Machine Learning
    Chen, Ada Y.
    Lee, Juyong
    Damjanovic, Ana
    Brooks, Bernard R.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2022, 18 (04) : 2673 - 2686
  • [46] Tree-Based Feature Transformation for Purchase Behavior Prediction
    Hou, Chunyan
    Chen, Chen
    Wang, Jinsong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (05): : 1441 - 1444
  • [47] On using multiple imputation for exploratory factor analysis of incomplete data
    Nassiri, Vahid
    Lovik, Aniko
    Molenberghs, Geert
    Verbeke, Geert
    BEHAVIOR RESEARCH METHODS, 2018, 50 (02) : 501 - 517
  • [48] Correspondence Analysis with Incomplete Paired Data using Bayesian Imputation
    de Tibeiro, Jules J. S.
    Murdoch, Duncan J.
    BAYESIAN ANALYSIS, 2010, 5 (03): : 519 - 532
  • [49] Imputation of Incomplete Motion Data Using Hidden Markov Models
    Uvarov, V. E.
    Popov, A. A.
    Gultyaeva, T. A.
    XII INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE APPLIED MECHANICS AND SYSTEMS DYNAMICS, 2019, 1210
  • [50] Imputation of incomplete data using adaptive ellipsoids with linear regression
    Yao, Leehter
    Weng, Kuei-Sung
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 29 (01) : 253 - 265