Trend-based Document Clustering for Sensitive and Stable Topic Detection

被引:0
|
作者
Sato, Yoshihide [1 ]
Kawashima, Harumi [2 ]
Okuda, Hidenori [2 ]
Oku, Masahiro [2 ]
机构
[1] NTT Corp, NTT West Corp, 1-1 Hikarino Oka, Yokosuka, Kanagawa 2390847, Japan
[2] NTT Corp, NTT Cyber Solut Labs, Yokosuka, Kanagawa 2390847, Japan
关键词
trend; clustering; gradient model; word frequency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to detect new topics and track them is important given the huge amounts of documents. This paper introduces a trend-based document clustering algorithm for analyzing them. Its key characteristic; is that it gives scores to words on the basis of the fluctuation in word frequency. The algorithm generates clusters in a practical time, with O(n) processing cost due to preliminary calculation of document distances. The attribute allows the user to settle on the best level of granularity for identifying topics. Experiments prove that our algorithm can gather relevant documents with F measure of 63.0% on average from the beginning to the end of topic lifetime and it largely surpasses other algorithms.
引用
收藏
页码:331 / +
页数:2
相关论文
共 50 条
  • [41] Document clustering and cluster topic extraction in multilingual corpora
    Silva, J
    Mexia, J
    Coelho, A
    Lopes, G
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 513 - 520
  • [42] A New Document Clustering Algorithm for Topic Discovering and Labeling
    Anaya-Sanchez, Henry
    Pons-Porrata, Aurora
    Berlanga-Llavori, Rafael
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2008, 5197 : 161 - +
  • [43] Using topic keyword clusters for automatic document clustering
    Chang, HC
    Hsu, CC
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (08) : 1852 - 1860
  • [44] Topic-Constrained Hierarchical Clustering for Document Datasets
    Zhao, Ying
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 181 - 192
  • [45] Using topic keyword clusters for automatic document clustering
    Chang, HC
    Hsu, CC
    THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2005, : 419 - 424
  • [46] Document Clustering Meets Topic Modeling with Word Embeddings
    Costa, Gianni
    Ortale, Riccardo
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 244 - 252
  • [47] Fuzzy clustering for topic analysis and summarization of document collections
    Witte, Rene
    Bergler, Sabine
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2007, 4509 : 476 - +
  • [48] A Novel Approach of Neural Topic Modelling for Document Clustering
    Subramani, Sandhya
    Sridhar, Vaishnavi
    Shetty, Kaushal
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 2169 - 2173
  • [49] A trend-based method for the prediction of offshore wind power ramp
    He, Yaoyao
    Zhu, Chuang
    An, Xueli
    RENEWABLE ENERGY, 2023, 209 : 248 - 261
  • [50] Trend-based asset flow in technical analysis and securities marketing
    Caginalp, G
    Balenovich, D
    PSYCHOLOGY & MARKETING, 1996, 13 (04) : 407 - 444