Mutual information criterion for feature selection from incomplete data

被引:60
|
作者
Qian, Wenbin [1 ,2 ]
Shu, Wenhao [3 ]
机构
[1] Jiangxi Agr Univ, Sch Software, Nanchang 330045, Peoples R China
[2] Beijing Key Lab Knowledge Engn Mat Sci, Beijing 100083, Peoples R China
[3] East China Jiaotong Univ, Sch Informat Engn, Nanchang 330013, Peoples R China
关键词
Feature selection; Uncertainty measure; Mutual information; Incomplete data; Rough sets; FEATURE SUBSET-SELECTION; ATTRIBUTE REDUCTION; MAX-DEPENDENCY; DISCRETIZATION; ALGORITHMS; RELEVANCE; SYSTEMS;
D O I
10.1016/j.neucom.2015.05.105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an important preprocessing step in machine learning and data mining, and feature criterion arises a key issue in the construction of feature selection algorithms. Mutual information is one of the widely used criteria in feature selection, which determines the relevance between features and target classes. Some mutual information-based feature selection algorithms have been extensively studied, but less effort has been made to investigate the feature selection issue in incomplete data. In this paper, combined with the tolerance information granules in rough sets, the mutual information criterion is provided for evaluating candidate features in incomplete data, which not only utilizes the largest mutual information with the target class but also takes into consideration the redundancy between selected features. We first validate the feasibility of the mutual information. Then an effective mutual information-based feature selection algorithm with forward greedy strategy is developed in incomplete data. To further accelerate the feature selection process, the selection of candidate features is implemented in a dwindling object set. Compared with existing feature selection algorithms, the experimental results on different real data sets show that the proposed algorithm is more effective for feature selection in incomplete data at most cases. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:210 / 220
页数:11
相关论文
共 50 条
  • [31] Is mutual information adequate for feature selection in regression?
    Frenay, Benoit
    Doquire, Gauthier
    Verleysen, Michel
    [J]. NEURAL NETWORKS, 2013, 48 : 1 - 7
  • [32] Feature Selection and Discretization based on Mutual Information
    Sharmin, Sadia
    Ali, Amin Ahsan
    Khan, Muhammad Asif Hossain
    Shoyaib, Mohammad
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR), 2017,
  • [33] Discriminant Mutual Information for Text Feature Selection
    Wang, Jiaqi
    Zhang, Li
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 136 - 151
  • [34] Mutual information for feature selection: estimation or counting?
    Nguyen H.B.
    Xue B.
    Andreae P.
    [J]. Evolutionary Intelligence, 2016, 9 (3) : 95 - 110
  • [35] FEATURE SELECTION WITH WEIGHTED CONDITIONAL MUTUAL INFORMATION
    Celik, Ceyhun
    Bilge, Hasan Sakir
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2015, 30 (04): : 585 - 596
  • [36] Gait Feature Subset Selection by Mutual Information
    Guo, Baofeng
    Nixon, Mark S.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (01): : 36 - 46
  • [37] Feature Selection with Mutual Information for Regression Problems
    Sulaiman, Muhammad Aliyu
    Labadin, Jane
    [J]. 2015 9TH INTERNATIONAL CONFERENCE ON IT IN ASIA (CITA), 2015,
  • [38] PCA based on mutual information for feature selection
    [J]. Fan, X.-L. (fanxueli@mail.ioa.ac.cn), 1600, Northeast University (28):
  • [39] Genetic algorithm for feature selection with mutual information
    Ge, Hong
    Hu, Tianliang
    [J]. 2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 1, 2014, : 116 - 119
  • [40] Feature Selection by Maximizing Part Mutual Information
    Gao, Wanfu
    Hu, Liang
    Zhang, Ping
    [J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MACHINE LEARNING (SPML 2018), 2018, : 120 - 127