Feature ranking based consensus clustering for feature subset selection

被引:0
|
作者
Rani, D. Sandhya [1 ,2 ]
Rani, T. Sobha [2 ]
Bhavani, S. Durga [2 ]
Krishna, G. Bala [1 ]
机构
[1] CVR Coll Engn, Comp Sci & Engn, Hyderabad 501015, Telangana, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Feature subset; Consensus clustering; Feature ranking; Large dataset; MUTUAL INFORMATION; CLASSIFICATION; ALGORITHM; RELEVANCE;
D O I
10.1007/s10489-024-05566-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature subset selection problem is an NP hard problem and there is a need for computationally efficient algorithms that find near optimal feature subsets which improve the performance of a classifier. Two major challenges for feature subset selection are high-dimensional data, that is, data with a large number of features and large datasets. Scalability of the feature selection algorithms in terms of accuracy for high dimensional data and the time taken for large datasets are important issues. We propose a consensus clustering based approach to feature selection that addresses these issues. There exist many greedy feature ranking algorithms in the literature that are computationally efficient. Each algorithm assigns a different ranking order to the features. A consensus among these rankings may provide a feature ranking that performs well with respect to time as well as accuracy. The goal of this work is to propose efficient algorithms that work on small as well as large datasets. The contributions of this work include: i. A fast and scalable approach for feature selection Feature ranking based on consensus clustering(FRCC), has been designed using the available feature ranking algorithms from the literature. ii. A parallelizable version of FRCC, namely, Hybrid Feature Selection(HFS), is proposed to address the feature reduction in large datasets. The implementation results show that FRCC clearly outperforms many recent algorithms in the literature on small as well as large dimensional data sets. HFS has been implemented on datasets with lakhs of instances and dimensionality in hundreds and thousands. HFS proves to be very effective in terms of feature reduction and accuracy in comparison to the results obtained by recent algorithms in the literature.
引用
收藏
页码:8154 / 8169
页数:16
相关论文
共 50 条
  • [1] Feature Subset Selection Using Consensus Clustering
    Rani, D. Sandhya
    Rani, T. Sobha
    Bhavani, S. Durga
    2015 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2015, : 57 - +
  • [2] A novel grey-based feature ranking method for feature subset selection
    Huang, Chi-Chun
    Chang, Hsin-Yun
    Yang, Cheng-Hong
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2008, 31 (03) : 509 - 514
  • [3] A novel grey-based feature ranking method for feature subset selection
    Huang, Chi-Chun
    Chang, Hsin-Yun
    Yang, Cheng-Hong
    2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 129 - 132
  • [4] Feature subset selection and feature ranking for multivariate time series
    Yoon, H
    Yang, KY
    Shahabi, C
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (09) : 1186 - 1198
  • [5] An Adaptive Multiple Feature Subset Method for Feature Ranking and Selection
    Chang, Fu
    Chen, Jen-Cheng
    INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 255 - 262
  • [6] FSS-OBOP: Feature subset selection guided by a bucket order consensus ranking
    Aledo, Juan A.
    Ga, Jose A.
    Molina, David
    Rosete, Alejandro
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [7] Feature subset selection and ranking for data dimensionality reduction
    Wei, Hua-Liang
    Billings, Stephen A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (01) : 162 - 166
  • [8] Euclidean distance based feature ranking and subset selection for bearing fault diagnosis
    Patel, Sachin P.
    Upadhyay, S. H.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 154
  • [9] Interactive textual feature selection for consensus clustering
    Correa, Geraldo N.
    Marcacini, Ricardo M.
    Hruschka, Eduardo R.
    Rezende, Solange O.
    PATTERN RECOGNITION LETTERS, 2015, 52 : 25 - 31
  • [10] Ranking-Based Feature Selection Method for Dynamic Belief Clustering
    Ben Hariz, Sarra
    Elouedi, Zied
    ADAPTIVE AND INTELLIGENT SYSTEMS, 2011, 6943 : 308 - 319