Comparison of Clustering Algorithms in Text Clustering Tasks

被引:2
|
作者
Gallardo Garcia, Rafael [1 ]
Beltran, Beatriz [1 ,2 ]
Vilarino, Darnes [1 ,2 ]
Zepeda, Claudia [1 ]
Martinez, Rodolfo [1 ]
机构
[1] Benemerita Univ Autonoma Puebla, Fac Comp Sci, Puebla, Mexico
[2] Benemerita Univ Autonoma Puebla, Language & Knowledge Engn Lab, Puebla, Mexico
来源
COMPUTACION Y SISTEMAS | 2020年 / 24卷 / 02期
关键词
Affinity propagation; f-measure; k-means; spectral clustering; PAN;
D O I
10.13053/CyS-24-2-3369
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The purpose of this paper is to compare the performance and accuracy of several clustering algorithms in text clustering tasks. The text preprocessing were realized by using the Term Frequency - Inverse Document Frequency in order to obtain weights for each word in each text and then obtain weights for each text. The Cosine Similarity was used as the similarity measure between the texts. The clustering tasks were realized over the PAN dataset and three different algorithms were used: Affinity Propagation, K-Means and Spectral Clustering. This paper presents the results in comparative tables: ID of the task, ground truth clusters and the clusters generated by the algorithms. A table with precision, recall and f-measure scores is presented.
引用
收藏
页码:429 / 437
页数:9
相关论文
共 50 条
  • [41] The research of text clustering algorithms based on frequent term sets
    Liu, XW
    He, PL
    Wang, HY
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 2352 - 2356
  • [42] A study on text clustering algorithms based on frequent term sets
    Liu, XW
    He, PL
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 347 - 354
  • [43] Text clustering based on fusion of ant colony and genetic algorithms
    Zhang, Yun
    Feng, Boqin
    Ma, Shouqiang
    Liu, Lianmeng
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2007, 41 (10): : 1146 - 1150
  • [44] ITSA*: An Effective Iterative Method for Short-Text Clustering Tasks
    Errecalde, Marcelo
    Ingaramo, Diego
    Rosso, Paolo
    TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 550 - 559
  • [45] Text Clustering using Ensemble Clustering Technique
    Mateen, Muhammad
    Wen, Junhao
    Song, Sun
    Hassan, Mehdi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (09) : 185 - 190
  • [46] On Possibilistic Clustering Algorithms based on Noise Clustering
    Kanzawa, Yuchi
    2016 JOINT 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 17TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2016, : 42 - 47
  • [47] Comparison of color clustering algorithms for segmentation of dermatological images
    Melli, Rudy
    Grana, Costantino
    Cucchiara, Rita
    MEDICAL IMAGING 2006: IMAGE PROCESSING, PTS 1-3, 2006, 6144
  • [48] A Comparison Study between Various Fuzzy Clustering Algorithms
    Bataineh, K. M.
    Naji, M.
    Saqer, M.
    JORDAN JOURNAL OF MECHANICAL AND INDUSTRIAL ENGINEERING, 2011, 5 (04): : 335 - 343
  • [49] Comparison of clustering algorithms and protocols for wireless sensor networks
    Arboleda, Liliana M.
    Nasser, Nidal
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 1267 - +
  • [50] Comparison of Major Clustering Algorithms Using Weka Tool
    Gunasekara, R. P. T. H.
    Wijegunasekara, M. C.
    Dias, N. G. J.
    14TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) 2014, 2014, : 272 - 272