Sampling and feature selection in a genetic algorithm for document clustering

被引:0
|
作者
Casillas, A [1 ]
de Lena, MTG
Martínez, R
机构
[1] Univ Basque Country, Dpt Elect & Elect, E-48080 Bilbao, Spain
[2] Univ Rey Juan Carlos, Dpt Informat Estadist & Telemat, Madrid, Spain
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we describe a Genetic Algorithm for document clustering that includes a sampling technique to reduce computation time. This algorithm calculates an approximation of the optimum k value, and solves the best grouping of the documents into these k clusters. We evaluate this algorithm with sets of documents that are the output of a query in a search engine. Two types of experiment are carried out to determine: (1) how the genetic algorithm works with a sample of documents, (2) which document features lead to the best clustering according to an external evaluation. On the one hand, our CA with sampling performs the clustering in a time that makes interaction with a search engine viable. On the other hand, our CA approach with the representation of the documents by means of entities leads to better results than representation by lemmas only.
引用
收藏
页码:601 / 612
页数:12
相关论文
共 50 条
  • [21] A harmony search algorithm for clustering with feature selection
    Cobos, Carlos
    Leon, Elizabeth
    Mendoza, Martha
    [J]. REVISTA FACULTAD DE INGENIERIA-UNIVERSIDAD DE ANTIOQUIA, 2010, (55): : 153 - 164
  • [22] Filtering methods for feature selection in web-document clustering
    Park, Heum
    Kwon, Hyuk-Chul
    [J]. COMPUTATIONAL SCIENCE - ICCS 2007, PT 2, PROCEEDINGS, 2007, 4488 : 1218 - +
  • [23] Simultaneous Feature Selection and Clustering for Categorical Features Using Multi Objective Genetic Algorithm
    Dutta, Dipankar
    Dutta, Paramartha
    Sil, Jaya
    [J]. 2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 191 - 196
  • [24] 9 Simultaneous Continuous Feature Selection and K Clustering by Multi Objective Genetic Algorithm
    Dutta, Dipankar
    Dutta, Paramartha
    Sil, Jaya
    [J]. PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 937 - 942
  • [25] Genetic Programming as a Feature Selection Algorithm
    Suarez, Ranyart R.
    Maria Valencia-Ramirez, Jose
    Graff, Mario
    [J]. 2014 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2014,
  • [26] Genetic programming with a genetic algorithm for feature construction and selection
    Smith M.G.
    Bull L.
    [J]. Genetic Programming and Evolvable Machines, 2005, 6 (3) : 265 - 281
  • [27] Feature selection algorithm based on quantum genetic algorithm
    Zhang, Ge-Xiang
    Jin, Wei-Dong
    Hu, Lai-Zhao
    [J]. Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2005, 22 (05): : 810 - 813
  • [28] An Efficient Productive Feature Selection and Document Clustering (PFS-DocC) Model for Document Clustering Document Clustering using PFS-DocC Model
    Pitchandi, Perumal
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 125 - 133
  • [29] An Efficient Algorithm Combining Spectral Clustering with Feature Selection
    Qimin Luo
    Guoqiu Wen
    Leyuan Zhang
    Mengmeng Zhan
    [J]. Neural Processing Letters, 2020, 52 : 1913 - 1925
  • [30] FCFilter: Feature Selection based on Clustering and Genetic Algorithms
    Ferreira, Charles H. P.
    de Medeiros, Debora M. R.
    Santana, Fabiana
    [J]. 2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 2106 - 2113