Dynamic clustering for short text stream based on Dirichlet process

被引:7
|
作者
Xu, Wanyin [1 ]
Li, Yun [1 ]
Qiang, Jipeng [1 ]
机构
[1] Yangzhou Univ, Dept Software Engn, Yangzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text stream; Dirichlet process; Dynamic clustering; Topic drift; SELECTION;
D O I
10.1007/s10489-021-02263-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the explosive growth of short text on various social media platforms, short text stream clustering has become an increasingly prominent issue. Unlike traditional text streams, short text stream data present the following characteristics: short length, weak signal, high volume, high velocity, topic drift, etc. Existing methods cannot simultaneously address two major problems very well: inferring the number of topics and topic drift. Therefore, we propose a dynamic clustering algorithm for short text streams based on the Dirichlet process (DCSS), which can automatically learn the number of topics in documents and solve the topic drift problem of short text streams. To solve the sparsity problem of short texts, DCSS considers the correlation of the topic distribution at neighbouring time points and uses the inferred topic distribution of past documents as a prior of the topic distribution at the current moment while simultaneously allowing newly streamed documents to change the posterior distribution of topics. We conduct experiments on two widely used datasets, and the results show that DCSS outperforms existing methods and has better stability.
引用
收藏
页码:4651 / 4662
页数:12
相关论文
共 50 条
  • [21] Evaluating Short Text Stream Clustering on Large E-commerce Datasets
    Andrade, Cesar
    Ribeiro, Rita P.
    Gama, Joao
    INTELLIGENT SYSTEMS, BRACIS 2024, PT III, 2025, 15414 : 245 - 259
  • [22] Text stream clustering algorithm based on adaptive feature selection
    Gong, Linghui
    Zeng, Jianping
    Zhang, Shiyong
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (03) : 1393 - 1399
  • [23] Feature Word Vector Based on Short Text Clustering
    Liu, Xin
    Wang, Bo
    Xi, Yao-yi
    Mao, Er-song
    Ke, Sheng-cai
    Tang, Yong-wang
    COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 533 - 545
  • [24] Topic Based Temporal Generative Short Text Clustering
    Smitha, E. S.
    Sendhilkumar, S.
    Mahalakshmi, G. S.
    Sanju, S. Krithika
    PROCEEDING OF THE INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS, BIG DATA AND IOT (ICCBI-2018), 2020, 31 : 912 - 922
  • [25] Model-based Clustering of Short Text Streams
    Yin, Jianhua
    Chao, Daren
    Liu, Zhongkun
    Zhang, Wei
    Yu, Xiaohui
    Wang, Jianyong
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2634 - 2642
  • [26] Study of robot demonstration learning based on the Dirichlet process clustering
    Wu X.
    He M.
    Liu T.
    Zhang X.
    Shao G.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (01): : 265 - 274
  • [27] GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clustering
    Vo, Tham
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (06): : 4321 - 4341
  • [28] GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clustering
    Vo, Tham
    Neural Computing and Applications, 2022, 34 (06) : 4321 - 4341
  • [29] An Online Semantic-Enhanced Graphical Model for Evolving Short Text Stream Clustering
    Kumar, Jay
    Din, Salah Ud
    Yang, Qinli
    Kumar, Rajesh
    Shao, Junming
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13809 - 13820
  • [30] GOWSeqStream: an integrated sequential embedding and graph-of-words for short text stream clustering
    Tham Vo
    Neural Computing and Applications, 2022, 34 : 4321 - 4341