Sampling and feature selection in a genetic algorithm for document clustering

被引：0

作者：

Casillas, A ^{[1
]}

de Lena, MTG

Martínez, R

机构：

[1] Univ Basque Country, Dpt Elect & Elect, E-48080 Bilbao, Spain

[2] Univ Rey Juan Carlos, Dpt Informat Estadist & Telemat, Madrid, Spain

来源：

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING | 2004年 / 2945卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we describe a Genetic Algorithm for document clustering that includes a sampling technique to reduce computation time. This algorithm calculates an approximation of the optimum k value, and solves the best grouping of the documents into these k clusters. We evaluate this algorithm with sets of documents that are the output of a query in a search engine. Two types of experiment are carried out to determine: (1) how the genetic algorithm works with a sample of documents, (2) which document features lead to the best clustering according to an external evaluation. On the one hand, our CA with sampling performs the clustering in a time that makes interaction with a search engine viable. On the other hand, our CA approach with the representation of the documents by means of entities leads to better results than representation by lemmas only.

引用

页码：601 / 612

页数：12

共 50 条

[1] A Clustering Based Genetic Algorithm for Feature Selection
Rostami, Mehrdad
Moradi, Parham
[J]. 2014 6TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2014, : 112 - 116
[2] Feature selection and document clustering
Dhillon, I
Kogan, J
Nicholas, C
[J]. SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 73 - 100
[3] A feature selection Bayesian approach for a clustering genetic algorithm
Hruschka, ER
Hruschka, ER
Ebecken, NFF
[J]. DATA MINING IV, 2004, 7 : 181 - 192
[4] Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification
Endalie, Demeke
Haile, Getamesay
Abebe, Wondmagegn Taye
[J]. PEERJ COMPUTER SCIENCE, 2022, 8
[5] LDA Based Feature Selection for Document Clustering
Kumar, B. Shravan
Ravi, Vadlamani
[J]. COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, : 125 - 130
[6] A Feature Selection for Korean Web Document Clustering
Park, Heum
Kim, Young-Gi
Kwon, Hyuk-Chul
[J]. IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 3, 2004, : 2650 - 2654
[7] A feature selection algorithm for document clustering based on word co-occurence frequency
Liu, YC
Wang, XL
Liu, BQ
[J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2963 - 2968
[8] Application of Genetic Algorithm in Document Clustering
Wei Jian-Xiang
Liu Huai
Sun Yue-hong
Su Xin-Ning
[J]. 2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS, 2009, : 145 - +
[9] Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering
Abualigah, Laith Mohammad
Khader, Ahamad Tajudin
Al-Betar, Mohammed Azmi
[J]. 2016 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2016,
[10] A feature selection bayesian approach for extracting classification rules with a clustering genetic algorithm
Hruschka, ER
Hruschka, ER
Ebecken, NFF
[J]. APPLIED ARTIFICIAL INTELLIGENCE, 2003, 17 (5-6) : 489 - 506

← 1 2 3 4 5 →