Feature Subset Selection based on Redundancy Maximized Clusters

被引:3
|
作者
Tarek, Md Hasan [1 ]
Kadir, Md Eusha [1 ]
Sharmin, Sadia [2 ]
Sajib, Abu Ashfaqur [3 ]
Ali, Amin Ahsan [2 ]
Shoyaib, Mohammad [1 ]
机构
[1] Univ Dhaka, Inst Informat Technol, Dhaka, Bangladesh
[2] Islamic Univ Technol, Comp Sci & Engn, Gazipur, Bangladesh
[3] Univ Dhaka, Genet Engn & Biotechnol, Dhaka, Bangladesh
关键词
Clustering; Normalized mutual information; Bias correction; Feature selection; CHRONIC LYMPHOCYTIC-LEUKEMIA; MUTUAL INFORMATION; ALGORITHMS;
D O I
10.1109/ICMLA52953.2021.00087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection plays a vital role in the field of data mining and machine learning for analyzing high-dimensional data. A popular criteria for feature selection is Mutual Information (MI) as it can capture both the linear and non-linear relationship among different features and class variable. Existing MI based feature selection methods use different approximation techniques to capture the joint performance of features, their relationship with the classes and eliminate the redundant features. However, these approximations may fail to select the optimal set of features, especially when the feature dimension is high. Besides, due to the absence of an appropriate searching strategy, these MI based approximations may select unnecessary features. To address these issues, we propose a method namely Feature Selection based on Redundancy maximized Clusters (FSRC) that creates the clusters of redundant features and then selects a subset of representative features from each cluster. We also propose to use bias corrected normalized MI in this regard. Rigorous experiments performed on thirty benchmark datasets demonstrate that FSRC outperforms the existing state-of-the-art methods in most of the cases. Moreover, FSRC is applied to three gene expression datasets which are high-dimensional but small sample datasets. The result shows that FSRC can select the features (genes) that are not only discriminating but also biologically relevant.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 50 条
  • [41] Unsupervised Feature Selection Based on Matrix Factorization with Redundancy Minimization
    Fan, Yang
    Dai, Jianhua
    Xu, Siqi
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 549 - 560
  • [42] Governance of the Redundancy in the Feature Selection Based on Rough Sets' Reducts
    Grzegorowski, Marek
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 548 - 557
  • [43] A feature selection algorithm based on redundancy analysis and interaction weight
    Xiangyuan Gu
    Jichang Guo
    Chongyi Li
    Lijun Xiao
    Applied Intelligence, 2021, 51 : 2672 - 2686
  • [44] A novel grey-based feature ranking method for feature subset selection
    Huang, Chi-Chun
    Chang, Hsin-Yun
    Yang, Cheng-Hong
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2008, 31 (03) : 509 - 514
  • [45] A novel grey-based feature ranking method for feature subset selection
    Huang, Chi-Chun
    Chang, Hsin-Yun
    Yang, Cheng-Hong
    2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 129 - 132
  • [46] A Population Based Feature Subset Selection Algorithm Guided by Fuzzy Feature Dependency
    Al-Ani, Ahmed
    Khushaba, Rami N.
    ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS, 2012, 322 : 430 - +
  • [47] Feature Selection on High Dimensional Data using Wrapper Based Subset Selection
    Manikandan, G.
    Susi, E.
    Abirami, S.
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 320 - 325
  • [48] Classifier Ensemble with Relevance-Based Feature Subset Selection
    Zhao, Junyang
    Zhang, Zhili
    Chang, Zhenjun
    Liu, Dianjian
    2017 2ND INTERNATIONAL CONFERENCE ON IMAGE, VISION AND COMPUTING (ICIVC 2017), 2017, : 1137 - 1141
  • [49] A genetic algorithm-based method for feature subset selection
    Tan, Feng
    Fu, Xuezheng
    Zhang, Yanqing
    Bourgeois, Anu G.
    SOFT COMPUTING, 2008, 12 (02) : 111 - 120
  • [50] Interpretable feature subset selection: A Shapley value based approach
    Tripathi, Sandhya
    Hemachandra, N.
    Trivedi, Prashant
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5463 - 5472