Topic discovery based on text mining techniques

被引:50
|
作者
Pons-Porrata, Aurora
Berlanga-Llavori, Rafael
Ruiz-Shulcloper, Jose
机构
[1] Univ Jaume 1, E-12071 Castellon de La Plana, Spain
[2] Univ Oriente, Ctr Pattern Recognit & Data Min, Santiago De Cuba 90500, Cuba
[3] Adv Technol Applicat Ctr, Havana, Cuba
关键词
hierarchical clustering; text summarization; topic detection;
D O I
10.1016/j.ipm.2006.06.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:752 / 768
页数:17
相关论文
共 50 条
  • [31] Effective Pattern Discovery for Text Mining
    Zhong, Ning
    Li, Yuefeng
    Wu, Sheng-Tang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (01) : 30 - 44
  • [32] Relevance Feature Discovery for Text Mining
    Li, Yuefeng
    Algarni, Abdulmohsen
    Albathan, Mubarak
    Shen, Yan
    Bijaksana, Moch Arif
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (06) : 1656 - 1669
  • [33] TEXT AND DATA MINING FOR BIOMEDICAL DISCOVERY
    Gonzalez, Graciela
    Cohen, Kevin Bretonnel
    Leaman, Robert
    Greene, Casey S.
    Shah, Nigam
    Kann, Maricel G.
    Ye, Jieping
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014, 2014, : 312 - 315
  • [34] TopCat: Data mining for topic identification in a text corpus
    Clifton, C
    Cooley, R
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 174 - 183
  • [35] Joint group and topic discovery from relations and text
    McCallum, Andrew
    Wang, Xuerui
    Mohanty, Natasha
    STATISTICAL NETWORK ANALYSIS: MODELS, ISSUES, AND NEW DIRECTIONS, 2007, 4503 : 28 - +
  • [36] Building topic maps using a text mining approach
    Yang, HC
    Lee, CH
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2003, 2871 : 307 - 314
  • [37] A Short Text Topic Discovery Method for Social Network
    Liu Jia
    Wang Qinglin
    Liu Yu
    Li Yuan
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 512 - 516
  • [38] Torpedo: Topic Periodicity Discovery from Text Data
    Wang, Jingjing
    Deng, Hongbo
    Han, Jiawei
    NEXT-GENERATION ANALYST III, 2015, 9499
  • [39] TopCat: Data mining for topic identification in a text corpus
    Clifton, C
    Cooley, R
    Rennie, J
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (08) : 949 - 964
  • [40] Text Value and Linguistic Characterization in Chinese Language Literature Based on Text Mining Techniques
    Liu M.
    Hu S.
    Qing W.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)