A document clustering algorithm for discovering and describing topics

被引:26
|
作者
Anaya-Sanchez, Henry [1 ]
Pons-Porrata, Aurora [2 ]
Berlanga-Llavori, Rafael [1 ]
机构
[1] Univ Jaume 1, Dept Languages & Comp Syst, Castellon de La Plana, Spain
[2] Univ Oriente, Ctr Pattern Recognit & Data Min, Santiago De Cuba, Cuba
关键词
Document clustering; Topic discovery; Topic description;
D O I
10.1016/j.patrec.2009.11.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a new clustering algorithm for discovering and describing the topics comprised in a text collection. Our proposal relies on both the most probable term pairs generated from the collection and the estimation of the topic homogeneity associated to these pairs Topics and their descriptions are generated from those term pairs whose support sets are homogeneous enough for representing collection topics Experimental results obtained over three benchmark text collections demonstrate the effectiveness and utility of this new approach (C) 2009 Published by Elsevier B V
引用
收藏
页码:502 / 510
页数:9
相关论文
共 50 条
  • [1] A New Document Clustering Algorithm for Topic Discovering and Labeling
    Anaya-Sanchez, Henry
    Pons-Porrata, Aurora
    Berlanga-Llavori, Rafael
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 161 - +
  • [2] The Research on Document Clustering of Network hot Topics
    Tang, Lin
    Si, Wei
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 2472 - 2475
  • [3] On Discovering the Number of Document Topics via Conceptual Latent Space
    Nghia Duong-Trung
    Schmidt-Thieme, Lars
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2051 - 2054
  • [4] Discovering news topics from microblogs based on hidden topics analysis and text clustering
    Lu, Rong
    Xiang, Liang
    Liu, Ming-Rong
    Yang, Qing
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2012, 25 (03): : 382 - 387
  • [5] Discovering relationships between topics of conferences by filtering, extracting and clustering
    Mine, T
    Lu, SM
    Amamiya, M
    13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 205 - 209
  • [6] Document clustering with hierarchical algorithm
    Wang, Y
    Hodges, J
    Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1614 - 1617
  • [7] A NEURAL ALGORITHM FOR DOCUMENT CLUSTERING
    MACLEOD, KJ
    ROBERTSON, W
    INFORMATION PROCESSING & MANAGEMENT, 1991, 27 (04) : 337 - 346
  • [8] Discovering Latent Topics by Gaussian Latent Dirichlet Allocation and Spectral Clustering
    Yuan, Bo
    Gao, Xinbo
    Niu, Zhenxing
    Tian, Qi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [9] Citation LDA plus plus : an Extension of LDA for Discovering Topics in Document Network
    Thuc Nguyen
    Phuc Do
    PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 31 - 37
  • [10] Application of fuzzy clustering algorithm in Chinese document clustering
    Li, Jiafu
    Zhang, Yafei
    Lu, Jianjiang
    Jisuanji Gongcheng/Computer Engineering, 2002, 28 (04):