INCREMENTAL CLUSTERING IN SHORT TEXT STREAMS BASED ON BM25

被引:0
|
作者
Xu, Lixin [1 ]
Chen, Guang [1 ]
Yang, Lei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
Short text stream; Incremental clustering; BM25; Cluster cohesion; Keyword similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since short text is short of keywords and has sparse features, it brings about the similarity drift problem. The traditional clustering algorithms are usually ineffective and a waste of resources on dealing with short text stream. To overcome the above problems, this paper proposes an incremental clustering algorithm in short text streams based on BM25. The approach makes full use of BM25 to extract keywords and weights of each cluster, and applies extracted parameters to similarity calculation. Theoretical analysis and experiments show that the proposed incremental clustering algorithm solves the similarity drift problem well and achieves satisfactory accuracy and performance in terms of short text stream clustering, compared with the traditional clustering algorithms.
引用
收藏
页码:8 / 12
页数:5
相关论文
共 50 条
  • [1] OnSeS: A Novel Online Short Text Summarization based on BM25 and Neural Network
    Niu, Jianwei
    Zhao, Qingjuan
    Wang, Lei
    Chen, Huan
    Atiquzzaman, Mohammed
    Peng, Fei
    [J]. 2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
  • [2] Opinion Summarization for Short Texts based on BM25 and Syntactic Parsing
    Niu, Jianwei
    Zhao, Qingjuan
    Wang, Lei
    Chen, Huan
    Zheng, Shichao
    [J]. 2016 IEEE 14TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2016, : 1177 - 1180
  • [3] Improving document clustering using Okapi BM25 feature weighting
    Whissell, John S.
    Clarke, Charles L. A.
    [J]. INFORMATION RETRIEVAL, 2011, 14 (05): : 466 - 487
  • [4] Improving document clustering using Okapi BM25 feature weighting
    John S. Whissell
    Charles L. A. Clarke
    [J]. Information Retrieval, 2011, 14 : 466 - 487
  • [5] Injecting the BM25 Score as Text Improves BERT-Based Re-rankers
    Askari, Arian
    Abolghasemi, Amin
    Pasi, Gabriella
    Kraaij, Wessel
    Verberne, Suzan
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 66 - 83
  • [6] Model-based Clustering of Short Text Streams
    Yin, Jianhua
    Chao, Daren
    Liu, Zhongkun
    Zhang, Wei
    Yu, Xiaohui
    Wang, Jianyong
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2634 - 2642
  • [7] Field-weighted XML retrieval based on BM25
    Lu, Wei
    Robertson, Stephen
    MacFarlane, Andrew
    [J]. ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 : 161 - 171
  • [8] Incremental autoencoders for text streams clustering in social networks
    Rekik, Amal
    Jamoussi, Salma
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (11) : 1203 - 1221
  • [9] BM25t: a BM25 extension for focused information retrieval
    Gery, Mathias
    Largeron, Christine
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 217 - 241
  • [10] Bug report quality detection based on the BM25 algorithm
    Chen L.
    Huang S.
    Sun J.
    Hui Z.
    Wu K.
    [J]. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2020, 60 (10): : 829 - 836