Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

被引:8
|
作者
Cao Truong Tran [1 ,2 ]
Zhang, Mengjie [1 ]
Andreae, Peter [1 ]
Xue, Bing [1 ]
Lam Thu Bui [2 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
[2] Le Qui Don Tech Univ, Fac Informat Technol, Hanoi, Vietnam
关键词
Incomplete data; Multiple imputation; Ensemble learning; Classification; MISSING DATA;
D O I
10.1007/978-3-319-49049-6_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing values are a common issue in many real-world datasets, and therefore coping with such datasets is an essential requirement of classification since inadequate treatment of missing values often leads to large classification errors. One of the most popular ways to address incomplete data is to use imputation methods to fill missing fields with plausible values. Multiple imputation, which fills each missing field with a set of plausible values, is a powerful approach to dealing with incomplete data, but is mainly used for statistical analysis. Ensemble learning which constructs a set of classifiers instead of one classifier has proven capable of improving classification accuracy, but has been mainly applied to complete data. This paper proposes a combination of multiple imputation and ensemble learning to build an ensemble of classifiers for incomplete data classification tasks. A multiple imputation method is used to generate a set of diverse imputed datasets which is then used to build a set of diverse classifiers. Experiments on ten benchmark datasets use a decision tree as classification algorithm and compare the proposed approach with two other popular approaches to dealing with incomplete data. The results show that, in almost all cases, the proposed method achieves significantly better classification accuracy than the other methods.
引用
收藏
页码:401 / 415
页数:15
相关论文
共 50 条
  • [1] Multiple Imputation and Genetic Programming for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    [J]. PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 521 - 528
  • [2] Multiple Imputation by Generative Adversarial Networks for Classification with Incomplete Data
    Bao Ngoc Vi
    Dinh Tan Nguyen
    Cao Truong Tran
    Huu Phuc Ngo
    Chi Cong Nguyen
    Hai-Hong Phan
    [J]. 2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 162 - 167
  • [3] Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning
    Li, Jiaxi
    Wang, Zhelong
    Wu, Lina
    Qiu, Sen
    Zhao, Hongyu
    Lin, Fang
    Zhang, Ke
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (05) : 3102 - 3113
  • [4] Multiple imputation for incomplete data with semicontinuous variables
    Javaras, KN
    Van Dyk, DA
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (463) : 703 - 715
  • [5] A multiple imputation strategy for incomplete longitudinal data
    Landrum, MB
    Becker, MP
    [J]. STATISTICS IN MEDICINE, 2001, 20 (17-18) : 2741 - 2760
  • [6] Multiple Imputation for Incomplete Data in Epidemiologic Studies
    Harel, Ofer
    Mitchell, Emily M.
    Perkins, Neil J.
    Cole, Stephen R.
    Tchetgen, Eric J. Tchetgen
    Sun, BaoLuo
    Schisterman, Enrique F.
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2018, 187 (03) : 576 - 584
  • [7] Genetic Programming with Interval Functions and Ensemble Learning for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Xue, Bing
    Andreae, Peter
    [J]. AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 577 - 589
  • [8] Bootstrapping and multiple imputation ensemble approaches for classification problems
    Khan, Shehroz S.
    Ahmad, Amir
    Mihailidis, Alex
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (06) : 7769 - 7783
  • [9] Incomplete data ensemble classification using imputation-revision framework with local spatial neighborhood information
    Yan, Yuanting
    Wu, Yaya
    Du, Xiuquan
    Zhang, Yanping
    [J]. APPLIED SOFT COMPUTING, 2021, 99
  • [10] Autoencoder-based multi-task learning for imputation and classification of incomplete data
    Lai, Xiaochen
    Wu, Xia
    Zhang, Liyong
    [J]. APPLIED SOFT COMPUTING, 2021, 98