Evaluation of the Dirichlet Process Multinomial Mixture Model for Short-Text Topic Modeling

被引:1
|
作者
Karlsson, Alexander [1 ]
Duarte, Denio [2 ]
Mathiason, Gunnar [1 ]
Bae, Juhee [1 ]
机构
[1] Univ Skovde, Sch Informat, Skovde, Sweden
[2] Fed Univ Fronteira Sul, Campus Chapeco, Chapeco, Brazil
关键词
text analysis; topic modeling; Bayesian non-parametrics; Dirichlet process; short text;
D O I
10.1109/ISCBI.2018.00025
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Fast-moving trends, both in society and in highly competitive business areas, call for effective methods for automatic analysis. The availability of fast-moving sources in the form of short texts, such as social media and blogs, allows aggregation from a vast number of text sources, for an up to date view of trends and business insights. Topic modeling is established as an approach for analysis of large amounts of texts, but the scarcity of statistical information in short texts is considered to be a major problem for obtaining reliable topics from traditional models such as LDA. A range of different specialized topic models have been proposed, but a majority of these approaches rely on rather strong parametric assumptions, such as setting a fixed number of topics. In contrast, recent advances in the field of Bayesian non-parametrics suggest the Dirichlet process as a method that, given certain hyper-parameters, can self-adapt to the number of topics of the data at hand. We perform an empirical evaluation of the Dirichlet process multinomial (unigram) mixture model against several parametric topic models, initialized with different number of topics. The resulting models are evaluated, using both direct and indirect measures that have been found to correlate well with human topic rankings. We show that the Dirichlet Process Multinomial Mixture model is a viable option for short text topic modeling since it on average performs better, or nearly as good, compared to the parametric alternatives, while reducing parameter setting requirements and thereby eliminates the need of expensive preprocessing.
引用
收藏
页码:79 / 83
页数:5
相关论文
共 50 条
  • [1] Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clustering
    Alsmadi, Mutasem K.
    Alzaqebah, Malek
    Jawarneh, Sana
    Almarashdeh, Ibrahim
    Al-Betar, Mohammed Azmi
    Alwohaibi, Maram
    Al-Mulla, Noha A.
    Ahmed, Eman A. E.
    AL Smadi, Ahmad
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [2] Topic Modeling on Podcast Short-Text Metadata
    Valero, Francisco B.
    Baranes, Marion
    Epure, Elena, V
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 472 - 486
  • [3] An Adaptive Dirichlet Multinomial Mixture Model for Short Text Streaming Clustering
    Duan, Ruting
    Li, Chunping
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 49 - 55
  • [4] A comparison of the performance of latent Dirichlet allocation and the Dirichlet multinomial mixture model on short text
    Mazarura, Jocelyn
    de Waal, Alta
    [J]. 2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [5] Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
    Li, Ximing
    Zhang, Ang
    Li, Changchun
    Guo, Lantian
    Wang, Wenting
    Ouyang, Jihong
    [J]. COMPUTER JOURNAL, 2019, 62 (03): : 359 - 372
  • [6] Dirichlet Multinomial Mixture with Variational Manifold Regularization: Topic Modeling over Short Texts
    Li, Ximing
    Zhang, Jiaojiao
    Ouyang, Jihong
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7884 - 7891
  • [7] A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering
    Yin, Jianhua
    Wang, Jianyong
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 233 - 242
  • [8] Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
    Albalawi, Rania
    Yeap, Tet Hin
    Benyoucef, Morad
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2020, 3
  • [9] Inference and evaluation of the multinomial mixture model for text clustering
    Rigouste, Lois
    Cappe, Olivier
    Yvon, Francois
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (05) : 1260 - 1280
  • [10] Enhancing Big Social Media Data Quality for Use in Short-Text Topic Modeling
    Murshed, Belal Abdullah Hezam
    Abawajy, Jemal
    Mallappa, Suresha
    Saif, Mufeed Ahmed Naji
    Al-Ghuribi, Sumaia Mohammed
    Ghanem, Fahd A.
    [J]. IEEE ACCESS, 2022, 10 : 105328 - 105351