A Feature Subset Selection Method Based On High-Dimensional Mutual Information

被引:25
|
作者
Zheng, Yun [1 ,2 ]
Kwoh, Chee Keong [3 ]
机构
[1] Fudan Univ, Inst Dev Biol & Mol Med, Shanghai 200433, Peoples R China
[2] Fudan Univ, Sch Life Sci, Shanghai 200433, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
关键词
feature selection; mutual information; Entropy; information theory; Markov blanket; classification; CLASSIFICATION; ALGORITHM; NETWORKS; PREDICTION; CANCER;
D O I
10.3390/e13040860
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Feature selection is an important step in building accurate classifiers and provides better understanding of the data sets. In this paper, we propose a feature subset selection method based on high-dimensional mutual information. We also propose to use the entropy of the class attribute as a criterion to determine the appropriate subset of features when building classifiers. We prove that if the mutual information between a feature set X and the class attribute Y equals to the entropy of Y, then X is a Markov Blanket of Y. We show that in some cases, it is infeasible to approximate the high-dimensional mutual information with algebraic combinations of pairwise mutual information in any forms. In addition, the exhaustive searches of all combinations of features are prerequisite for finding the optimal feature subsets for classifying these kinds of data sets. We show that our approach outperforms existing filter feature subset selection methods for most of the 24 selected benchmark data sets.
引用
收藏
页码:860 / 901
页数:42
相关论文
共 50 条
  • [1] Feature selection, mutual information, and the classification of high-dimensional patterns
    Bonev, Boyan
    Escolano, Francisco
    Cazorla, Miguel
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2008, 11 (3-4) : 309 - 319
  • [2] Feature Selection using Mutual Information for High-dimensional Data Sets
    Nagpal, Arpita
    Gaur, Deepti
    Gaur, Seema
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 45 - 49
  • [3] High-dimensional supervised feature selection via optimized kernel mutual information
    Bi, Ning
    Tan, Jun
    Lai, Jian-Huang
    Suen, Ching Y.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 108 : 81 - 95
  • [4] Gait feature subset selection by mutual information
    Guo, Baofeng
    Nixon, Mark. S.
    [J]. 2007 FIRST IEEE INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS, 2007, : 187 - 192
  • [5] Gait Feature Subset Selection by Mutual Information
    Guo, Baofeng
    Nixon, Mark S.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (01): : 36 - 46
  • [6] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14
  • [7] Feature Subset Selection Approach Based on Fuzzy Rough Set for High-dimensional Data
    Guo, Changyou
    Zheng, Xuefeng
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2014, : 72 - 75
  • [8] Feature subset selection wrapper based on mutual information and rough sets
    Foithong, Sombut
    Pinngern, Ouen
    Attachoo, Boonwat
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 574 - 584
  • [9] Heterogeneous feature subset selection using mutual information-based feature transformation
    Wei, Min
    Chow, Tommy W. S.
    Chan, Rosa H. M.
    [J]. NEUROCOMPUTING, 2015, 168 : 706 - 718
  • [10] Implementation of FAST Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Shilu, Smit
    Sheth, Kushal
    Mehul, Ekata
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 : 203 - 213