Clustering Text Data Streams

被引:8
|
作者
刘玉葆 [1 ]
蔡嘉荣 [1 ]
印鉴 [1 ]
傅蔚慈 [2 ]
机构
[1] Department of Computer Science,Sun Yat-Sen University
[2] Department of Computer Science and Engineering,the Chinese University of Hong Kong
基金
中国国家自然科学基金;
关键词
clustering; database applications; data mining; text data streams;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
Clustering text data streams is an important issue in data mining community and has a number of applica- tions such as news group filtering,text crawling,document organization and topic detection and tracing etc.However, most methods axe similaxity-based approaches and only use the TF*IDF scheme to represent the semantics of text data and often lead to poor clustering quality.Recently,researchers argue that semantic smoothing model is more efficient than the existing TF*IDF scheme for improving text clustering quality.However,the existing semantic smoothing model is not suitable for dynamic text data context.In this paper,we extend the semantic smoothing model into text data streams context firstly.Based on the extended model,we then present two online clustering algorithms OCTS and OCTSM for the clustering of massive text data streams.In both algorithms,we also present a new cluster statistics structure named cluster profile which can capture the semantics of text data streams dynamically and at the same time speed up the clustering process.Some efficient implementations for our algorithms are also given.Finally,we present a series of experimental results illustrating the effectiveness of our technique.
引用
收藏
页码:112 / 128
页数:17
相关论文
共 50 条
  • [21] Efficient Clustering of Short Text Streams using Online-Offline Clustering
    Rakib, Md Rashadul Hasan
    Zeh, Norbert
    Milios, Evangelos
    PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
  • [22] Clustering Data Streams with Adaptive Forgetting
    Nutakki, Gopi Chand
    Nasraoui, Olfa
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 494 - 497
  • [23] A Comparison of Clustering Algorithms for Data Streams
    Pereira, Cassio M. M.
    de Mello, Rodrigo F.
    INTEGRATED COMPUTING TECHNOLOGY, 2011, 165 : 59 - 74
  • [24] Dynamically Evolving Clustering for Data Streams
    Baruah, Rashmi Dutta
    Angelov, Plamen
    Baruah, Diganta
    2014 IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2014,
  • [25] An Adaptive Framework for Clustering Data Streams
    Chandrika
    Kumar, K. R. Ananda
    ADVANCES IN COMPUTING AND COMMUNICATIONS, PT I, 2011, 190 : 704 - +
  • [26] Online clustering of parallel data streams
    Beringer, Juergen
    Huellermeier, Eyke
    DATA & KNOWLEDGE ENGINEERING, 2006, 58 (02) : 180 - 204
  • [27] Estimating clustering indexes in data streams
    Buriol, Luciana S.
    Frahling, Gereon
    Leonardi, Stefano
    Sohler, Christian
    ALGORITHMS - ESA 2007, PROCEEDINGS, 2007, 4698 : 618 - +
  • [28] Subspace Clustering and Visualization of Data Streams
    Louhi, Ibrahim
    Boudjeloud-Assala, Lydia
    Tamisier, Thomas
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 3, 2017, : 259 - 265
  • [29] Efficient clustering of uncertain data streams
    Cheqing Jin
    Jeffrey Xu Yu
    Aoying Zhou
    Feng Cao
    Knowledge and Information Systems, 2014, 40 : 509 - 539
  • [30] Clustering on demand for multiple data streams
    Dai, BR
    Huang, JW
    Yeh, MY
    Chen, MS
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 367 - 370