Significance of entropy correlation coefficient over symmetric uncertainty on Fast Clustering feature selection algorithm

被引:0
|
作者
Malji, Pallavi [1 ]
Sakhare, Sachin [1 ]
机构
[1] Vishwakarma Inst Informat Technol, Dept Comp Engn, Pune, Maharashtra, India
来源
PROCEEDINGS OF 2017 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO 2017) | 2017年
关键词
FAST; Feature selection; clustering; minimum spanning tree; entropy correlation coefficient; Naive Bayes classifier; Kruskal's algorithm;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is an essential method in which we identify a subset of most useful ones from the original set of features. On comparing results with original set and identified subset, we observe that the results are compatible. The feature selection algorithm is evaluated based on the components of efficiency and effectiveness, where the time required and the optimality of the subset of the feature is considered. Based on this, we are modifying the fast clustering feature selection algorithm, to check the impact of entropy correlation coefficient on it in this paper. In the algorithm, the correlation between the features is calculated using entropy correlation coefficient instead of symmetric uncertainty and then they are divided into clusters using clustering methods based on the graph. Then, the representative features i.e. those who are strongly related to the target class are selected from them. For ensuring the algorithm's efficiency, we have adopted the Kruskal minimum spanning tree (MST) clustering method. We have compared our proposed algorithm with FAST clustering feature selection algorithm on well-known classifier namely the probability-based Naive Bayes Classifier before and after feature selection. The results, on two publicly available real-world high dimensional text data, demonstrate that our proposed algorithm produces smaller and optimal features subset and also improves classifiers performance. The processing time required for the algorithm is far less than that of the FAST clustering algorithm.
引用
收藏
页码:457 / 463
页数:7
相关论文
共 50 条
  • [1] A fast algorithm for feature selection in conditional maximum entropy modeling
    Zhou, YQ
    Weng, FL
    Wu, L
    Schmidt, H
    PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 153 - 159
  • [2] Feature subset selection algorithm based on symmetric uncertainty and interaction factor
    Gu, Xiangyuan
    Chen, Jianguo
    Wu, Guoqiang
    Wang, Kun
    Wang, Jiaxing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 11247 - 11260
  • [3] Feature subset selection algorithm based on symmetric uncertainty and interaction factor
    Xiangyuan Gu
    Jianguo Chen
    Guoqiang Wu
    Kun Wang
    Jiaxing Wang
    Multimedia Tools and Applications, 2024, 83 : 11247 - 11260
  • [4] Symmetric uncertainty based decomposition multi-objective immune algorithm for feature selection
    Chai, Zhengyi
    Li, Wangwang
    Li, Yalun
    SWARM AND EVOLUTIONARY COMPUTATION, 2023, 78
  • [5] Significance of Clustering Coefficient over Jaccard Index
    Gupta, Anand Kumar
    Sardana, Neetu
    2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 463 - 466
  • [6] FSCME: A Feature Selection Method Combining Copula Correlation and Maximal Information Coefficient by Entropy Weights
    Zhong, Qi
    Shang, Junliang
    Ren, Qianqian
    Li, Feng
    Jiao, Cui-Na
    Liu, Jin-Xing
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5638 - 5648
  • [7] Feature Selection with Attributes Clustering by Maximal Information Coefficient
    Zhao, Xi
    Deng, Wei
    Shi, Yong
    FIRST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2013, 17 : 70 - 79
  • [8] A Fast Chromatic Correlation Clustering Algorithm
    Gothania, Jaishri
    Buksh, Bala
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1870 - 1874
  • [9] New feature selection measure-the uncertainty coefficient
    Yang, Sheng
    Hu, Fuqiao
    Shi, Pengfei
    Jisuanji Gongcheng/Computer Engineering, 2004, 30 (08):
  • [10] Feature Subset Selection Algorithm Based on Symmetric Uncertainty and Three-Way Interaction Information
    Gu X.
    Guo J.
    Li C.
    Xiao L.
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2021, 54 (02): : 214 - 220