Mutual information criterion for feature selection from incomplete data

被引:60
|
作者
Qian, Wenbin [1 ,2 ]
Shu, Wenhao [3 ]
机构
[1] Jiangxi Agr Univ, Sch Software, Nanchang 330045, Peoples R China
[2] Beijing Key Lab Knowledge Engn Mat Sci, Beijing 100083, Peoples R China
[3] East China Jiaotong Univ, Sch Informat Engn, Nanchang 330013, Peoples R China
关键词
Feature selection; Uncertainty measure; Mutual information; Incomplete data; Rough sets; FEATURE SUBSET-SELECTION; ATTRIBUTE REDUCTION; MAX-DEPENDENCY; DISCRETIZATION; ALGORITHMS; RELEVANCE; SYSTEMS;
D O I
10.1016/j.neucom.2015.05.105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an important preprocessing step in machine learning and data mining, and feature criterion arises a key issue in the construction of feature selection algorithms. Mutual information is one of the widely used criteria in feature selection, which determines the relevance between features and target classes. Some mutual information-based feature selection algorithms have been extensively studied, but less effort has been made to investigate the feature selection issue in incomplete data. In this paper, combined with the tolerance information granules in rough sets, the mutual information criterion is provided for evaluating candidate features in incomplete data, which not only utilizes the largest mutual information with the target class but also takes into consideration the redundancy between selected features. We first validate the feasibility of the mutual information. Then an effective mutual information-based feature selection algorithm with forward greedy strategy is developed in incomplete data. To further accelerate the feature selection process, the selection of candidate features is implemented in a dwindling object set. Compared with existing feature selection algorithms, the experimental results on different real data sets show that the proposed algorithm is more effective for feature selection in incomplete data at most cases. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:210 / 220
页数:11
相关论文
共 50 条
  • [41] Unsupervised Feature Selection for Outlier Detection in Categorical Data using Mutual Information
    Suri, N. N. R. Ranga
    Murty, M. Narasimha
    Athithan, G.
    [J]. 2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 253 - 258
  • [42] An Overview of Methods for Feature Selection Based on Mutual Information for Stream Data Classification
    Wankhade, Kapil
    Rane, Dhiraj
    Thool, Ravindra
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 630 - 634
  • [43] Feature Selection using Mutual Information for High-dimensional Data Sets
    Nagpal, Arpita
    Gaur, Deepti
    Gaur, Seema
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 45 - 49
  • [44] Feature Selection with Conditional Mutual Information Considering Feature Interaction
    Liang, Jun
    Hou, Liang
    Luan, Zhenhua
    Huang, Weiping
    [J]. SYMMETRY-BASEL, 2019, 11 (07):
  • [45] Variable Weighted Maximal Relevance Minimal Redundancy Criterion for Feature Selection Using Normalized Mutual Information
    Bandyopadhyay, Sanghamitra
    Bhadra, Tapas
    Maulik, Ujjwal
    [J]. JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2015, 25 (2-3) : 189 - 213
  • [46] Optimal feature selection using distance-based discrete firefly algorithm with mutual information criterion
    Long Zhang
    Linlin Shan
    Jianhua Wang
    [J]. Neural Computing and Applications, 2017, 28 : 2795 - 2808
  • [47] Optimal feature selection using distance-based discrete firefly algorithm with mutual information criterion
    Zhang, Long
    Shan, Linlin
    Wang, Jianhua
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 (09): : 2795 - 2808
  • [48] Online feature selection and classification with incomplete data
    Kalkan, Habil
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2014, 22 (06) : 1625 - 1636
  • [49] A novel feature selection framework for incomplete data
    Guo, Cong
    Yang, Wei
    Li, Zheng
    Liu, Chun
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 252
  • [50] Bagging and Feature Selection for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 471 - 486