Comparison of Clustering Algorithms in Text Clustering Tasks

被引:2
|
作者
Gallardo Garcia, Rafael [1 ]
Beltran, Beatriz [1 ,2 ]
Vilarino, Darnes [1 ,2 ]
Zepeda, Claudia [1 ]
Martinez, Rodolfo [1 ]
机构
[1] Benemerita Univ Autonoma Puebla, Fac Comp Sci, Puebla, Mexico
[2] Benemerita Univ Autonoma Puebla, Language & Knowledge Engn Lab, Puebla, Mexico
来源
COMPUTACION Y SISTEMAS | 2020年 / 24卷 / 02期
关键词
Affinity propagation; f-measure; k-means; spectral clustering; PAN;
D O I
10.13053/CyS-24-2-3369
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The purpose of this paper is to compare the performance and accuracy of several clustering algorithms in text clustering tasks. The text preprocessing were realized by using the Term Frequency - Inverse Document Frequency in order to obtain weights for each word in each text and then obtain weights for each text. The Cosine Similarity was used as the similarity measure between the texts. The clustering tasks were realized over the PAN dataset and three different algorithms were used: Affinity Propagation, K-Means and Spectral Clustering. This paper presents the results in comparative tables: ID of the task, ground truth clusters and the clusters generated by the algorithms. A table with precision, recall and f-measure scores is presented.
引用
收藏
页码:429 / 437
页数:9
相关论文
共 50 条
  • [1] Research on Text Clustering Algorithms
    Li Qun
    Huang Xinyuan
    2010 2ND INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS PROCEEDINGS (DBTA), 2010,
  • [2] On Semantic Evaluation of Text Clustering Algorithms
    Nguyen, Sinh Hoa
    Swieboda, Wojciech
    Nguyen, Hung Son
    2014 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2014, : 224 - 229
  • [3] Arabic text clustering using improved clustering algorithms with dimensionality reduction
    Arun Kumar Sangaiah
    Ahmed E. Fakhry
    Mohamed Abdel-Basset
    Ibrahim El-henawy
    Cluster Computing, 2019, 22 : 4535 - 4549
  • [4] Arabic text clustering using improved clustering algorithms with dimensionality reduction
    Sangaiah, Arun Kumar
    Fakhry, Ahmed E.
    Abdel-Basset, Mohamed
    El-henawy, Ibrahim
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : S4535 - S4549
  • [5] Comparison of Clustering Algorithms to Design New Clustering Approach
    Sirsikar, Sumedha
    Wankhede, Kalyani
    PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL(ICAC3'15), 2015, 49 : 147 - 154
  • [6] Fuzzy clustering algorithms in subjective classification tasks
    Chacon M., Mario I.
    Ramirez, Graciela
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 2309 - +
  • [7] CLUSTERING ALGORITHMS FOR LIBRARY COMPARISON
    SRIDHAR, V
    MURTY, MN
    PATTERN RECOGNITION, 1991, 24 (09) : 815 - 823
  • [8] Comparison of Data Mining Clustering Algorithms
    Shah, Chintan
    Jivani, Anjali
    2013 4TH NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING (NUICONE 2013), 2013,
  • [9] Comparison of fuzzy clustering algorithms for classification
    Almeida, R. J.
    Sousa, J. M. C.
    2006 INTERNATIONAL SYMPOSIUM ON EVOLVING FUZZY SYSTEMS, PROCEEDINGS, 2006, : 112 - +
  • [10] Consensus Clustering Algorithms: Comparison and Refinement
    Goder, Andrey
    Filkov, Vladimir
    PROCEEDINGS OF THE TENTH WORKSHOP ON ALGORITHM ENGINEERING AND EXPERIMENTS AND THE FIFTH WORKSHOP ON ANALYTIC ALGORITHMICS AND COMBINATORICS, 2008, : 109 - 117