Feature ranking based consensus clustering for feature subset selection

被引:0
|
作者
Rani, D. Sandhya [1 ,2 ]
Rani, T. Sobha [2 ]
Bhavani, S. Durga [2 ]
Krishna, G. Bala [1 ]
机构
[1] CVR Coll Engn, Comp Sci & Engn, Hyderabad 501015, Telangana, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Feature subset; Consensus clustering; Feature ranking; Large dataset; MUTUAL INFORMATION; CLASSIFICATION; ALGORITHM; RELEVANCE;
D O I
10.1007/s10489-024-05566-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature subset selection problem is an NP hard problem and there is a need for computationally efficient algorithms that find near optimal feature subsets which improve the performance of a classifier. Two major challenges for feature subset selection are high-dimensional data, that is, data with a large number of features and large datasets. Scalability of the feature selection algorithms in terms of accuracy for high dimensional data and the time taken for large datasets are important issues. We propose a consensus clustering based approach to feature selection that addresses these issues. There exist many greedy feature ranking algorithms in the literature that are computationally efficient. Each algorithm assigns a different ranking order to the features. A consensus among these rankings may provide a feature ranking that performs well with respect to time as well as accuracy. The goal of this work is to propose efficient algorithms that work on small as well as large datasets. The contributions of this work include: i. A fast and scalable approach for feature selection Feature ranking based on consensus clustering(FRCC), has been designed using the available feature ranking algorithms from the literature. ii. A parallelizable version of FRCC, namely, Hybrid Feature Selection(HFS), is proposed to address the feature reduction in large datasets. The implementation results show that FRCC clearly outperforms many recent algorithms in the literature on small as well as large dimensional data sets. HFS has been implemented on datasets with lakhs of instances and dimensionality in hundreds and thousands. HFS proves to be very effective in terms of feature reduction and accuracy in comparison to the results obtained by recent algorithms in the literature.
引用
收藏
页码:8154 / 8169
页数:16
相关论文
共 50 条
  • [41] Feature subset selection for data and feature streams: a review
    Carlos Villa-Blanco
    Concha Bielza
    Pedro Larrañaga
    Artificial Intelligence Review, 2023, 56 : 1011 - 1062
  • [42] A new feature subset selection using bottom-up clustering
    Dehghan, Zeinab
    Mansoori, Eghbal G.
    PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (01) : 57 - 66
  • [43] Feature Subset Selection for Clustering using Binary Particle Swarm Optimization
    Dastider, Surjodoy Ghosh
    Kashyap, Himanshu
    Mandal, Shashwata
    Ghosh, Abhinandan
    Goswami, Saptarsi
    2015 14TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2015), 2015, : 159 - 164
  • [44] A new feature subset selection using bottom-up clustering
    Zeinab Dehghan
    Eghbal G. Mansoori
    Pattern Analysis and Applications, 2018, 21 : 57 - 66
  • [45] A Population Based Feature Subset Selection Algorithm Guided by Fuzzy Feature Dependency
    Al-Ani, Ahmed
    Khushaba, Rami N.
    ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS, 2012, 322 : 430 - +
  • [46] Feature subset selection for data and feature streams: a review
    Villa-Blanco, Carlos
    Bielza, Concha
    Larranaga, Pedro
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL 1) : 1011 - 1062
  • [47] Wrapper for ranking feature selection
    Ruiz, R
    Aguilar-Ruiz, JS
    Riquelme, JC
    INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 384 - 389
  • [48] Feature selection for clustering
    Dash, M
    Liu, H
    KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS: CURRENT ISSUES AND NEW APPLICATIONS, 2000, 1805 : 110 - 121
  • [49] Invariant optimal feature selection: A distance discriminant and feature ranking based solution
    Liang, Jianning
    Yang, Su
    Winstanley, Adam
    PATTERN RECOGNITION, 2008, 41 (05) : 1429 - 1439
  • [50] Differential evolution for filter feature selection based on information theory and feature ranking
    Hancer, Emrah
    Xue, Bing
    Zhang, Mengjie
    KNOWLEDGE-BASED SYSTEMS, 2018, 140 : 103 - 119