Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm

被引:0
|
作者
Antonio D’Ambrosio
Massimo Aria
Roberta Siciliano
机构
[1] University of Naples Federico II,Department of Mathematics and Statistics
来源
Journal of Classification | 2012年 / 29卷
关键词
Data editing; Tree-based methods; Boosting algorithm; FAST algorithm; Incremental imputation; Generalization error;
D O I
暂无
中图分类号
学科分类号
摘要
Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.
引用
收藏
页码:227 / 258
页数:31
相关论文
共 50 条
  • [1] Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm
    D'Ambrosio, Antonio
    Aria, Massimo
    Siciliano, Roberta
    [J]. JOURNAL OF CLASSIFICATION, 2012, 29 (02) : 227 - 258
  • [2] Tree-based Approach to Missing Data Imputation
    Vateekul, Peerapon
    Sarinnapakorn, Kanoksri
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 70 - +
  • [3] Boosted incremental tree-based imputation of missing data
    Siciliano, Roberta
    Aria, Massimo
    D'Ambrosio, Antonio
    [J]. DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 271 - +
  • [4] Incremental Tree-Based Missing Data Imputation with Lexicographic Ordering
    Claudio Conversano
    Roberta Siciliano
    [J]. Journal of Classification, 2009, 26 : 361 - 379
  • [5] Incremental Tree-Based Missing Data Imputation with Lexicographic Ordering
    Conversano, Claudio
    Siciliano, Roberta
    [J]. JOURNAL OF CLASSIFICATION, 2009, 26 (03) : 361 - 379
  • [6] Robust tree-based incremental imputation method for data fusion
    D'Ambrosio, Antonio
    Aria, Massimo
    Siciliano, Roberta
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VII, PROCEEDINGS, 2007, 4723 : 174 - +
  • [7] Evaluating a sequential tree-based procedure for multivariate imputation of complex missing data structures
    Borgoni, Riccardo
    Berrington, Ann
    [J]. QUALITY & QUANTITY, 2013, 47 (04) : 1991 - 2008
  • [8] Evaluating a sequential tree-based procedure for multivariate imputation of complex missing data structures
    Riccardo Borgoni
    Ann Berrington
    [J]. Quality & Quantity, 2013, 47 : 1991 - 2008
  • [9] A decision tree-based missing value imputation technique for data pre-processing
    Rahman, Md. Geaur
    Islam, Md. Zahidul
    [J]. Conferences in Research and Practice in Information Technology Series, 2010, 121 : 41 - 50
  • [10] Missing data incremental imputation through tree based methods
    Conversano, C
    Cappelli, C
    [J]. COMPSTAT 2002: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2002, : 455 - 460