Bagging and Feature Selection for Classification with Incomplete Data

被引:5
|
作者
Cao Truong Tran [1 ]
Zhang, Mengjie [1 ]
Andreae, Peter [1 ]
Xue, Bing [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
关键词
Incomplete data; Ensemble; Feature selection; Classification; Particle swam optimisation; C4.5; REPTree; MISSING DATA; ENSEMBLE;
D O I
10.1007/978-3-319-55849-3_31
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Missing values are an unavoidable issue of many real-world datasets. Dealing with missing values is an essential requirement in classification problem, because inadequate treatment with missing values often leads to large classification errors. Some classifiers can directly work with incomplete data, but they often result in big classification errors and generate complex models. Feature selection and bagging have been successfully used to improve classification, but they are mainly applied to complete data. This paper proposes a combination of bagging and feature selection to improve classification with incomplete data. To achieve this purpose, a wrapper-based feature selection which can directly work with incomplete data is used to select suitable feature subsets for bagging. The experiments on eight incomplete datasets were designed to compare the proposed method with three other popular methods that are able to deal with incomplete data using C4.5/REPTree as classifiers and using Particle Swam Optimisation as a search technique in feature selection. Results show that the combination of bagging and feature selection can not only achieve better classification accuracy than the other methods but also generate less complex models compared to the bagging method.
引用
收藏
页码:471 / 486
页数:16
相关论文
共 50 条
  • [21] On Feature Selection, Bias-Variance, and Bagging
    Munson, N. Arthur
    Caruana, Rich
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 144 - +
  • [22] Feature Selection in Clinical Data Processing For Classification
    Seethal, C. R.
    Panicker, Janu R.
    Vasudevan, Veena
    [J]. PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE (ICIS), 2016, : 172 - 175
  • [23] Automatic feature selection for classification of health data
    He, HX
    Jin, HD
    Chen, J
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 910 - 913
  • [24] Feature Selection for Classification of Hyperspectral Data by SVM
    Pal, Mahesh
    Foody, Giles M.
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2010, 48 (05): : 2297 - 2307
  • [25] Logic classification and feature selection for biomedical data
    Bertolazzi, P.
    Felici, G.
    Festa, P.
    Lancia, G.
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) : 889 - 899
  • [26] CLASSIFICATION AND FEATURE SELECTION WITH HUMAN PERFORMANCE DATA
    Pavlopoulou, Christina
    Yu, Stella X.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 1557 - 1560
  • [27] Efficient feature selection and classification for microarray data
    Li, Zifa
    Xie, Weibo
    Liu, Tao
    [J]. PLOS ONE, 2018, 13 (08):
  • [28] Feature Selection for EEG Data Classification with Weka
    Murtazina, Marina
    Avdeenko, Tatiana
    [J]. ADVANCES IN SWARM INTELLIGENCE, ICSI 2022, PT II, 2022, : 279 - 288
  • [29] A Projected Feature Selection Algorithm for Data Classification
    Yin, Zhiwu
    Huang, Shangteng
    [J]. 2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3665 - 3668
  • [30] Accelerating Incomplete Feature Selection
    Qian, Yuhua
    Liang, Jiye
    Wei, Wei
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 350 - +