Evaluation of the Dirichlet Process Multinomial Mixture Model for Short-Text Topic Modeling

被引:1
|
作者
Karlsson, Alexander [1 ]
Duarte, Denio [2 ]
Mathiason, Gunnar [1 ]
Bae, Juhee [1 ]
机构
[1] Univ Skovde, Sch Informat, Skovde, Sweden
[2] Fed Univ Fronteira Sul, Campus Chapeco, Chapeco, Brazil
关键词
text analysis; topic modeling; Bayesian non-parametrics; Dirichlet process; short text;
D O I
10.1109/ISCBI.2018.00025
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Fast-moving trends, both in society and in highly competitive business areas, call for effective methods for automatic analysis. The availability of fast-moving sources in the form of short texts, such as social media and blogs, allows aggregation from a vast number of text sources, for an up to date view of trends and business insights. Topic modeling is established as an approach for analysis of large amounts of texts, but the scarcity of statistical information in short texts is considered to be a major problem for obtaining reliable topics from traditional models such as LDA. A range of different specialized topic models have been proposed, but a majority of these approaches rely on rather strong parametric assumptions, such as setting a fixed number of topics. In contrast, recent advances in the field of Bayesian non-parametrics suggest the Dirichlet process as a method that, given certain hyper-parameters, can self-adapt to the number of topics of the data at hand. We perform an empirical evaluation of the Dirichlet process multinomial (unigram) mixture model against several parametric topic models, initialized with different number of topics. The resulting models are evaluated, using both direct and indirect measures that have been found to correlate well with human topic rankings. We show that the Dirichlet Process Multinomial Mixture model is a viable option for short text topic modeling since it on average performs better, or nearly as good, compared to the parametric alternatives, while reducing parameter setting requirements and thereby eliminates the need of expensive preprocessing.
引用
收藏
页码:79 / 83
页数:5
相关论文
共 50 条
  • [31] A tutorial on Dirichlet process mixture modeling
    Li, Yuelin
    Schofield, Elizabeth
    Gonen, Mithat
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2019, 91 : 128 - 144
  • [32] Multinomial mixture model with feature selection for text clustering
    Li, Minqiang
    Zhang, Liang
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) : 704 - 708
  • [33] An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
    Shalani Athukorala
    Wathsala Mohotti
    Social Network Analysis and Mining, 2022, 12
  • [34] Large Margin Multinomial Mixture Model for Text Categorization
    Pan, Zhen-Yu
    Jiang, Hui
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1566 - +
  • [35] Relevance-Promoting Language Model for Short-Text Conversation
    Li, Xin
    Li, Piji
    Bi, Wei
    Liu, Xiaojiang
    Lam, Wai
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8253 - 8260
  • [36] An effective short-text topic modelling with neighbourhood assistance-driven NMF in Twitter
    Athukorala, Shalani
    Mohotti, Wathsala
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 12 (01)
  • [37] STTM: A tool for short text topic modeling
    Qiang, Jipeng
    Li, Yun
    Yuan, Yunhao
    Liu, Wei
    Wu, Xindong
    arXiv, 2018,
  • [39] A Short-Text Similarity Model Combining Semantic and Syntactic Information
    Zhou, Ya
    Li, Cheng
    Huang, Guimin
    Guo, Qingkai
    Li, Hui
    Wei, Xiong
    ELECTRONICS, 2023, 12 (14)
  • [40] Flow Hierarchical Dirichlet Process for Complex Topic Modeling
    Han Z.-M.
    Zhang M.-M.
    Li M.-Q.
    Duan D.-G.
    Chen Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (07): : 1539 - 1552