A document clustering algorithm for discovering and describing topics

被引:26
|
作者
Anaya-Sanchez, Henry [1 ]
Pons-Porrata, Aurora [2 ]
Berlanga-Llavori, Rafael [1 ]
机构
[1] Univ Jaume 1, Dept Languages & Comp Syst, Castellon de La Plana, Spain
[2] Univ Oriente, Ctr Pattern Recognit & Data Min, Santiago De Cuba, Cuba
关键词
Document clustering; Topic discovery; Topic description;
D O I
10.1016/j.patrec.2009.11.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a new clustering algorithm for discovering and describing the topics comprised in a text collection. Our proposal relies on both the most probable term pairs generated from the collection and the estimation of the topic homogeneity associated to these pairs Topics and their descriptions are generated from those term pairs whose support sets are homogeneous enough for representing collection topics Experimental results obtained over three benchmark text collections demonstrate the effectiveness and utility of this new approach (C) 2009 Published by Elsevier B V
引用
收藏
页码:502 / 510
页数:9
相关论文
共 50 条
  • [31] Spherical credibilistic clustering algorithm for document data
    Chen, Xuan
    Zhou, Jian
    PROCEEDING OF THE SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES, 2008, 7 : 295 - 300
  • [32] EFFECT OF DOCUMENT ORDERING IN ROCCHIOS CLUSTERING ALGORITHM
    CODY, R
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1973, 24 (03): : 232 - 233
  • [33] WAF-based Document Clustering Algorithm
    Luo, Yang
    Chen, Guang
    Zhang, Yongtian
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 14 - 16
  • [34] An Improved Genetic Algorithm for Document Clustering on the Cloud
    Akter, Ruksana
    Chung, Yoojin
    INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2018, 8 (04) : 20 - 28
  • [35] Improved Cuckoo Search Algorithm for Document Clustering
    Boushaki, Saida Ishak
    Kamel, Nadjet
    Bendjeghaba, Omar
    COMPUTER SCIENCE AND ITS APPLICATIONS, CIIA 2015, 2015, 456 : 217 - 228
  • [36] Double Layered Genetic Algorithm for Document Clustering
    Choi, Lim Cheon
    Lee, Jung Song
    Park, Soon Cheol
    SOFTWARE ENGINEERING, BUSINESS CONTINUITY, AND EDUCATION, 2011, 257 : 212 - 218
  • [37] A Local Graph Clustering Algorithm for Discovering Subgoals in Reinforcement Learning
    Entezari, Negin
    Shiri, Mohammad Ebrahim
    Moradi, Parham
    COMMUNICATION AND NETWORKING, PT II, 2010, 120 : 41 - 50
  • [38] An improved OPTICS clustering algorithm for discovering clusters with uneven densities
    Tang, Chunhua
    Wang, Han
    Wang, Zhiwen
    Zeng, Xiangkun
    Yan, Huaran
    Xiao, Yingjie
    INTELLIGENT DATA ANALYSIS, 2021, 25 (06) : 1453 - 1471
  • [39] A novel nonparametric clustering algorithm for discovering arbitrary shaped clusters
    He, Y
    Chen, LH
    ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, 2003, : 1826 - 1830
  • [40] Domain of interests clustering algorithm based on users' preferred topics
    Gong, Wei-Hua
    Yang, Liang-Huai
    Jin, Rong
    Ding, Wei-Long
    Tongxin Xuebao/Journal on Communications, 2011, 32 (01): : 72 - 78