Dynamic clustering for short text stream based on Dirichlet process

被引:7
|
作者
Xu, Wanyin [1 ]
Li, Yun [1 ]
Qiang, Jipeng [1 ]
机构
[1] Yangzhou Univ, Dept Software Engn, Yangzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text stream; Dirichlet process; Dynamic clustering; Topic drift; SELECTION;
D O I
10.1007/s10489-021-02263-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the explosive growth of short text on various social media platforms, short text stream clustering has become an increasingly prominent issue. Unlike traditional text streams, short text stream data present the following characteristics: short length, weak signal, high volume, high velocity, topic drift, etc. Existing methods cannot simultaneously address two major problems very well: inferring the number of topics and topic drift. Therefore, we propose a dynamic clustering algorithm for short text streams based on the Dirichlet process (DCSS), which can automatically learn the number of topics in documents and solve the topic drift problem of short text streams. To solve the sparsity problem of short texts, DCSS considers the correlation of the topic distribution at neighbouring time points and uses the inferred topic distribution of past documents as a prior of the topic distribution at the current moment while simultaneously allowing newly streamed documents to change the posterior distribution of topics. We conduct experiments on two widely used datasets, and the results show that DCSS outperforms existing methods and has better stability.
引用
下载
收藏
页码:4651 / 4662
页数:12
相关论文
共 50 条
  • [1] Dynamic clustering for short text stream based on Dirichlet process
    Wanyin Xu
    Yun Li
    Jipeng Qiang
    Applied Intelligence, 2022, 52 : 4651 - 4662
  • [2] A Dirichlet process biterm-based mixture model for short text stream clustering
    Chen, Junyang
    Gong, Zhiguo
    Liu, Weiwen
    APPLIED INTELLIGENCE, 2020, 50 (05) : 1609 - 1619
  • [3] A Dirichlet process biterm-based mixture model for short text stream clustering
    Junyang Chen
    Zhiguo Gong
    Weiwen Liu
    Applied Intelligence, 2020, 50 : 1609 - 1619
  • [4] A topic-enhanced dirichlet model for short text stream clustering
    Liu, Kan
    He, Jiarui
    Chen, Yu
    NEURAL COMPUTING & APPLICATIONS, 2024, : 8125 - 8140
  • [5] An Online Dirichlet Model based on Sentence Embedding and DBSCAN for Noisy Short Text Stream Clustering
    Si, XianLiang
    Li, Peipei
    Hu, Xuegang
    Zhang, Yuhong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [6] An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering
    Kumar, Jay
    Shao, Junming
    Din, Salah ud
    Ali, Wazir
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 766 - 776
  • [7] A Topic-based Dynamic Clustering Algorithm for Text Stream
    Rao, Y.
    Li, X. J.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2015), 2015, 123 : 480 - 483
  • [8] A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
    Yin, Jianhua
    Wang, Jianyong
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 233 - 242
  • [9] Dirichlet Process Based Evolutionary Clustering
    Xu, Tianbing
    Zhang, Zhongfei
    Yu, Philip S.
    Long, Bo
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 648 - +
  • [10] An Adaptive Dirichlet Multinomial Mixture Model for Short Text Streaming Clustering
    Duan, Ruting
    Li, Chunping
    2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 49 - 55