A document clustering algorithm for discovering and describing topics

被引:26
|
作者
Anaya-Sanchez, Henry [1 ]
Pons-Porrata, Aurora [2 ]
Berlanga-Llavori, Rafael [1 ]
机构
[1] Univ Jaume 1, Dept Languages & Comp Syst, Castellon de La Plana, Spain
[2] Univ Oriente, Ctr Pattern Recognit & Data Min, Santiago De Cuba, Cuba
关键词
Document clustering; Topic discovery; Topic description;
D O I
10.1016/j.patrec.2009.11.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a new clustering algorithm for discovering and describing the topics comprised in a text collection. Our proposal relies on both the most probable term pairs generated from the collection and the estimation of the topic homogeneity associated to these pairs Topics and their descriptions are generated from those term pairs whose support sets are homogeneous enough for representing collection topics Experimental results obtained over three benchmark text collections demonstrate the effectiveness and utility of this new approach (C) 2009 Published by Elsevier B V
引用
收藏
页码:502 / 510
页数:9
相关论文
共 50 条
  • [11] Application of Algorithm CARDBK in Document Clustering
    ZHU Yehang
    ZHANG Mingjie
    SHI Feng
    Wuhan University Journal of Natural Sciences, 2018, 23 (06) : 514 - 524
  • [12] An extended chameleon algorithm for document clustering
    AmritaVishwaVidyapeetham, Dept. of Computer Science and Application, India
    Adv. Intell. Sys. Comput., (335-348):
  • [13] A Robust Algorithm for Fuzzy Document Clustering
    Chen, Lifei
    Wang, Shengrui
    Jiang, Qingshan
    2009 INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS: WAINA, VOLS 1 AND 2, 2009, : 679 - +
  • [14] An improved clustering algorithm for web document
    Wang, Jing
    Liu, Zhijing
    Journal of Information and Computational Science, 2009, 6 (02): : 959 - 966
  • [15] Frequent Document Mining Algorithm with Clustering
    Soni, Rakesh Kumar
    Gupta, Neetesh
    Sinhal, Amit
    Sahu, Shiv K.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (09): : 38 - 43
  • [16] A Novel Algorithm for Automatic Document Clustering
    Agrawal, Ranjana
    Phatak, Madhura
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 877 - 882
  • [17] An Improved AntTree Algorithm for Document Clustering
    Perez-Delgado, M. L.
    Escuadra, J.
    Anton, N.
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2010, 79 : 481 - 488
  • [18] Basic Firefly Algorithm for Document Clustering
    Mohammed, Athraa Jasim
    Yusof, Yuhanis
    Husni, Husniza
    INNOVATION AND ANALYTICS CONFERENCE AND EXHIBITION (IACE 2015), 2015, 1691
  • [19] Application of Genetic Algorithm in Document Clustering
    Wei Jian-Xiang
    Liu Huai
    Sun Yue-hong
    Su Xin-Ning
    2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS, 2009, : 145 - +
  • [20] Efficient document clustering algorithm and its application to a document browser
    Tanaka, Hideki
    Kumano, Tadashi
    Uratani, Noriyoshi
    Ehara, Terumasa
    Information Processing and Management, 1999, 35 (04): : 541 - 557