FCFilter: Feature Selection based on Clustering and Genetic Algorithms

被引:0
|
作者
Ferreira, Charles H. P. [1 ]
de Medeiros, Debora M. R. [1 ]
Santana, Fabiana [2 ]
机构
[1] Fed Univ ABC UFABC, Ctr Math Computat & Cognit, Santo Andre, SP, Brazil
[2] Univ Canberra, Fac Educ Sci Technol & Math, Canberra, ACT 2601, Australia
关键词
VALIDITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The search for patterns in big amounts of textual data, or text mining, can be at once rewarding and challenging. The patterns can reveal tendencies, similarities and predictions, but the information is usually implicit and difficult to be validated. Classification is one of the most relevant research areas in text mining, and it usually consists of predicting the class of a textual document based on a set of documents previously organized into different classes, such as author or topic. Choosing the words to compose the feature set is crucial to a proper classification. A well selected feature set can improve the performance of the classification method and enlighten the interpretation of the classification model adjusted to the data. This paper introduces the Feature Cluster Filter (FCFilter) method for feature selection. FCFilter eliminates the need to input or optimize the number of clusters by grouping the words in a sufficiently high number of clusters. Genetic algorithms are applied to optimize the combination of groups that will provide the final feature set. The method is based on the selection of features that are good predictors for text classification by clustering features and selecting only the suitable clusters. Experiments performed to evaluate the FCFilter with the Reuters-21578, SCY-Genes and SCY-Clusters datasets showed a significant reduction in the feature-value table dimensionality with slight improvements in the classification accuracy when compared to the baselines. The results are very promising, indicating potential improvements in the research on feature selection for text mining.
引用
收藏
页码:2106 / 2113
页数:8
相关论文
共 50 条
  • [1] Genetic algorithms for clustering, feature selection and classification
    Tseng, LY
    Yang, SB
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 1612 - 1616
  • [2] A new unsupervised feature selection method for text clustering based on genetic algorithms
    Pirooz Shamsinejadbabki
    Mohammad Saraee
    [J]. Journal of Intelligent Information Systems, 2012, 38 : 669 - 684
  • [3] A new unsupervised feature selection method for text clustering based on genetic algorithms
    Shamsinejadbabki, Pirooz
    Saraee, Mohammad
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 38 (03) : 669 - 684
  • [4] A Clustering Based Genetic Algorithm for Feature Selection
    Rostami, Mehrdad
    Moradi, Parham
    [J]. 2014 6TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2014, : 112 - 116
  • [5] Spectral Clustering Based Unsupervised Feature Selection Algorithms
    Xie, Juan-Ying
    Ding, Li-Juan
    Wang, Ming-Zhao
    [J]. Ruan Jian Xue Bao/Journal of Software, 2020, 31 (04): : 1009 - 1024
  • [6] A Feature Selection Method Based on Genetic Algorithms
    Jiang, Mingyang
    Fan, Xiaojing
    Zhang, Xinhong
    Jie, Lian
    Zhou, Yuxin
    Wang, QiangHu
    Zhang, ZhiFeng
    Pei, Zhili
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON MECHATRONICS, ELECTRONIC, INDUSTRIAL AND CONTROL ENGINEERING, 2014, 5 : 914 - +
  • [7] Genetic algorithms in feature and instance selection
    Tsai, Chih-Fong
    Eberle, William
    Chu, Chi-Yuan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 39 : 240 - 247
  • [8] Automatic feature selection by genetic algorithms
    Eberhardt, M
    Kossebau, FWH
    König, A
    [J]. ARTIFICIAL NEURAL NETS AND GENETIC ALGORITHMS, 2001, : 256 - 259
  • [9] A two stages algorithm for feature selection based on feature score and genetic algorithms
    Huang, Zhi
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (02): : 139 - 151
  • [10] Hybrid genetic algorithms for feature selection
    Oh, IS
    Lee, JS
    Moon, BR
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (11) : 1424 - 1437