Dynamic clustering for short text stream based on Dirichlet process

被引:7
|
作者
Xu, Wanyin [1 ]
Li, Yun [1 ]
Qiang, Jipeng [1 ]
机构
[1] Yangzhou Univ, Dept Software Engn, Yangzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text stream; Dirichlet process; Dynamic clustering; Topic drift; SELECTION;
D O I
10.1007/s10489-021-02263-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the explosive growth of short text on various social media platforms, short text stream clustering has become an increasingly prominent issue. Unlike traditional text streams, short text stream data present the following characteristics: short length, weak signal, high volume, high velocity, topic drift, etc. Existing methods cannot simultaneously address two major problems very well: inferring the number of topics and topic drift. Therefore, we propose a dynamic clustering algorithm for short text streams based on the Dirichlet process (DCSS), which can automatically learn the number of topics in documents and solve the topic drift problem of short text streams. To solve the sparsity problem of short texts, DCSS considers the correlation of the topic distribution at neighbouring time points and uses the inferred topic distribution of past documents as a prior of the topic distribution at the current moment while simultaneously allowing newly streamed documents to change the posterior distribution of topics. We conduct experiments on two widely used datasets, and the results show that DCSS outperforms existing methods and has better stability.
引用
下载
收藏
页码:4651 / 4662
页数:12
相关论文
共 50 条
  • [31] Clustering consistency with Dirichlet process mixtures
    Ascolani, F.
    Lijoi, A.
    Rebaudo, G.
    Zanella, G.
    BIOMETRIKA, 2023, 110 (02) : 551 - 558
  • [32] Evaluation of the Dirichlet Process Multinomial Mixture Model for Short-Text Topic Modeling
    Karlsson, Alexander
    Duarte, Denio
    Mathiason, Gunnar
    Bae, Juhee
    2018 6TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI 2018), 2018, : 79 - 83
  • [33] Short-Text Clustering Algorithm Based on Laplacian Graph
    Meng H.-N.
    Feng K.
    Zhu L.
    Zhang B.-B.
    Tong X.-Y.
    Hei X.-H.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (09): : 1716 - 1723
  • [34] Optimization Research based on the online comment clustering of short text
    Zhang, Ping
    Wang, Jianzhong
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 838 - 842
  • [35] Density-based clustering of short-text corpora
    Ingaramo, Diego A.
    Errecalde, Marcelo L.
    Rosso, Paolo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 81 - 88
  • [36] Corpus-based topic diffusion for short text clustering
    Zheng, Chu Tao
    Liu, Cheng
    Wong, Hau San
    NEUROCOMPUTING, 2018, 275 : 2444 - 2458
  • [37] MapReduce-based approach on short text conversation clustering
    Zhang, Y. (zyszjhz@163.com), 1600, Binary Information Press (10):
  • [38] Urban Activity Clustering Method Based on Dirichlet Process Mixture Model
    Chen Z.
    Jiaotong Yunshu Xitong Gongcheng Yu Xinxi/Journal of Transportation Systems Engineering and Information Technology, 2020, 20 (06): : 247 - 252
  • [39] A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus
    Luo, J.
    Yu, D.
    Dai, Z.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2020, 15 (02)
  • [40] AN EFFICIENT DATA STREAM CLUSTERING ALGORITHM BASED ON DYNAMIC GRIDS
    Yun Wu
    Gao Feng
    NEW TRENDS AND APPLICATIONS OF COMPUTER-AIDED MATERIAL AND ENGINEERING, 2011, 186 : 665 - +