A new AntTree-based algorithm for clustering short-text corpora

被引:0
|
作者
Luis Errecalde, Marcelo [1 ]
Alejandro Ingaramo, Diego [1 ]
Rosso, Paolo [2 ]
机构
[1] Univ Nacl San Luis, Dev & Res Lab Computac Intelligence LIDIC, San Luis, Argentina
[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, Nat Language Engn Lab ELiRF, Valencia, Spain
来源
关键词
Short-text clustering; Bio-inspired algorithms; AntTree; Internal Validity Measures; Silhouette Coefficient;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research work on "short-text clustering" is a very important research area due to the current tendency for people to use 'small-language', e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [1] Density-based clustering of short-text corpora
    Ingaramo, Diego A.
    Errecalde, Marcelo L.
    Rosso, Paolo
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 81 - 88
  • [2] On Clustering and Evaluation of Narrow Domain Short-Text Corpora
    Pinto Avendano, David Eduardo
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (42): : 129 - 130
  • [3] Short-Text Clustering Algorithm Based on Laplacian Graph
    Meng, Hai-Ning
    Feng, Kai
    Zhu, Lei
    Zhang, Bei-Bei
    Tong, Xin-Yu
    Hei, Xin-Hong
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (09): : 1716 - 1723
  • [4] Proximity estimation and hardness of short-text corpora
    Luis Errecalde, Marcelo
    Ingaramo, Diego
    Rosso, Paolo
    [J]. DEXA 2008: 19TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, : 15 - +
  • [5] A Scalable Short-Text Clustering Algorithm Using Apache Spark
    Akritidis, Leonidas
    Alamaniotis, Miltiadis
    Fevgas, Athanasios
    Bozanis, Panayiotis
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 927 - 934
  • [6] Evaluation of internal validity measures in short-text corpora
    Ingaramo, Diego
    Pinto, David
    Rosso, Paolo
    Errecalde, Marcelo
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 555 - 567
  • [7] Short-Text Clustering using Statistical Semantics
    Seifzadeh, Sepideh
    Farahat, Ahmed K.
    Kamel, Mohamed S.
    Karray, Fakhri
    [J]. WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 805 - 810
  • [8] Asymmetric Short-Text Clustering via Prompt
    Wang, Zhi
    Zhu, Yi
    Li, Yun
    Qiang, Jipeng
    Yuan, Yunhao
    Zhang, Chaowei
    [J]. NEW GENERATION COMPUTING, 2024,
  • [9] Unsupervised Anomaly Detection in Multi-Topic Short-Text Corpora
    Ait-Saada, Mira
    Nadif, Mohamed
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1392 - 1403
  • [10] D3CAS: Distributed Clustering Algorithm Applied to Short-Text Stream Processing
    Molina, Roberto
    Hasperue, Waldo
    Villa Monte, Augusto
    [J]. COMPUTER SCIENCE - CACIC 2018, 2019, 995 : 211 - 220