A comparison of the performance of latent Dirichlet allocation and the Dirichlet multinomial mixture model on short text

被引:0
|
作者
Mazarura, Jocelyn [1 ]
de Waal, Alta [1 ]
机构
[1] Univ Pretoria, Dept Stat, Pretoria, South Africa
关键词
Dirichlet multinomial mixture model; LDA; Short text; Topic coherence; Topic modeling; Topic stability;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The expansion of the World Wide Web and the increasing popularity of microblogging websites such as Twitter and Facebook has created massive stores of textual data that is short in length. Although traditional topic models have proven to work successfully on collections of long texts such as books and news articles, they tend to produce results that are less coherent when applied to short text, such as status messages and product reviews. Over the last few decades it has become of greater relevance to analyse short texts due to the realisation that such bodies of text could potentially hold useful information. Latent Dirichlet allocation (LDA) is one of the most popular topic models and it makes the generative assumption that each document contains multiple topics in varying proportions, which is a sensible assumption about long text. On the other hand, the Dirichlet multinomial mixture model (GSDMM), a seemingly less popular topic model, assumes that a document can only belong to a single topic, which seems to be a more appropriate assumption for short text. The objective of this paper is to investigate the hypothesis that GSDMM will outperform LDA on short text, using topic coherence and stability as performance measures.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] An Adaptive Dirichlet Multinomial Mixture Model for Short Text Streaming Clustering
    Duan, Ruting
    Li, Chunping
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 49 - 55
  • [2] A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
    Yin, Jianhua
    Wang, Jianyong
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 233 - 242
  • [3] Evaluation of the Dirichlet Process Multinomial Mixture Model for Short-Text Topic Modeling
    Karlsson, Alexander
    Duarte, Denio
    Mathiason, Gunnar
    Bae, Juhee
    [J]. 2018 6TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI 2018), 2018, : 79 - 83
  • [4] Latent Dirichlet mixture model
    Chien, Jen-Tzung
    Lee, Chao-Hsi
    Tan, Zheng-Hua
    [J]. NEUROCOMPUTING, 2018, 278 : 12 - 22
  • [5] BiModal Latent Dirichlet Allocation for Text and Image
    Liao, Xiaofeng
    Jiang, Qingshan
    Zhang, Wei
    Zhang, Kai
    [J]. 2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 736 - 739
  • [6] A text classification model constructed by Latent Dirichlet Allocation and Deep Learning
    Liu, Yu
    Jin, Zhengping
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2501 - 2504
  • [7] Evaluation of text semantic features using latent dirichlet allocation model
    Zhou, Chunjie
    Li, Nao
    Zhang, Chi
    Yang, Xiaoyu
    [J]. International Journal of Performability Engineering, 2020, 16 (06) : 968 - 978
  • [8] Supervised latent Dirichlet allocation with a mixture of sparse softmax
    Li, Xiaoxu
    Ma, Zhanyu
    Peng, Pai
    Guo, Xiaowei
    Huang, Feiyue
    Wang, Xiaojie
    Guo, Jun
    [J]. NEUROCOMPUTING, 2018, 312 : 324 - 335
  • [9] Railway Fault Text Clustering Method Using an Improved Dirichlet Multinomial Mixture Model
    Yang, Ni
    Zhang, Youpeng
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [10] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022