Clustering-based feature subset selection with analysis on the redundancy-complementarity dimension

被引:9
|
作者
Chen, Zhijun [1 ]
Chen, Qiushi [2 ]
Zhang, Yishi [3 ]
Zhou, Lei [4 ]
Jiang, Junfeng [5 ]
Wu, Chaozhong [1 ]
Huang, Zhen [1 ]
机构
[1] Wuhan Univ Technol, Intelligent Transportat Syst Res Ctr, Wuhan 430000, Peoples R China
[2] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430000, Peoples R China
[3] Wuhan Univ Technol, Sch Management, Wuhan 430000, Peoples R China
[4] Guangdong Univ Technol, Sch Management, Guangzhou 510006, Peoples R China
[5] Wuhan Technol & Business Univ, Coll Artificial Intelligence, Wuhan 430073, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Redundancy; Complementarity; Clustering; Minimum spanning tree; MUTUAL INFORMATION; CLASSIFICATION; DEPENDENCY; RELEVANCE;
D O I
10.1016/j.comcom.2021.01.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the era of big data, dimensionality reduction plays an extremely important role in many fields driven by machine learning and data mining techniques. The existing information-theoretic feature selection algorithms generally reduce the dimension by selecting the features with maximum class-relevance and minimum redundancy, while relatively overlook the complementary correlation among features and sometimes deal with it improperly. This paper proposes a novel feature subset selection algorithm called the Clustering-based Feature Selection with Redundancy-Complementarity Analysis (CFSRCA). The proposed algorithm can be mainly divided into two steps, namely, (a) selecting the candidate class-relevant features, and (b) selecting the representative features. In the latter step, the representative features are defined as the features with minimum redundancy and maximum complementarity, and a clustering method based on the minimum spanning tree (MST) is proposed to distinguish them effectively. To validate the effectiveness of CFSRCA, three comparative feature selection algorithms (ReliefF, CFS, and FOU) and four well-known classifiers (C4.5, SVM, kNN, and NBC) are used to conduct classification experiments on eight datasets. Experimental results verify the effectiveness of the proposed feature subset algorithm.
引用
收藏
页码:65 / 74
页数:10
相关论文
共 50 条
  • [1] A Clustering-Based Approach to Reduce Feature Redundancy
    de Amorim, Renato Cordeiro
    Mirkin, Boris
    [J]. KNOWLEDGE, INFORMATION AND CREATIVITY SUPPORT SYSTEMS: RECENT TRENDS, ADVANCES AND SOLUTIONS, KICSS 2013, 2016, 364 : 465 - 475
  • [2] A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Song, Qinbao
    Ni, Jingjie
    Wang, Guangtao
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (01) : 1 - 14
  • [3] An Improved Fast Clustering-Based Feature Subset Selection Algorithm for Multi Featured dataset
    Sharma, Poonam
    Mathur, Abhisek
    Chaturvedi, Sushil
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING AND TECHNOLOGY RESEARCH (ICAETR), 2014,
  • [4] A clustering-based feature selection via feature separability
    Jiang, Shengyi
    Wang, Lianxi
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2016, 31 (02) : 927 - 937
  • [5] Clustering-Based Subset Selection in Evolutionary Multiobjective Optimization
    Chen, Weiyu
    Ishibuchi, Hisao
    Shang, Ke
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 468 - 475
  • [6] Feature Subset Selection based on Redundancy Maximized Clusters
    Tarek, Md Hasan
    Kadir, Md Eusha
    Sharmin, Sadia
    Sajib, Abu Ashfaqur
    Ali, Amin Ahsan
    Shoyaib, Mohammad
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 521 - 526
  • [7] Implementation of FAST Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Shilu, Smit
    Sheth, Kushal
    Mehul, Ekata
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 : 203 - 213
  • [8] Fuzzy Clustering-based GMDH Model to Feature Selection in Customer Analysis
    Zhao, Hengjun
    He, Changzheng
    Ye, Zhen
    [J]. ISBIM: 2008 INTERNATIONAL SEMINAR ON BUSINESS AND INFORMATION MANAGEMENT, VOL 1, 2009, : 461 - 464
  • [9] A new feature selection algorithm based on relevance, redundancy and complementarity
    Li, Chao
    Luo, Xiao
    Qi, Yanpeng
    Gao, Zhenbo
    Lin, Xiaohui
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 119
  • [10] Feature ranking based consensus clustering for feature subset selection
    Rani, D. Sandhya
    Rani, T. Sobha
    Bhavani, S. Durga
    Krishna, G. Bala
    [J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8154 - 8169