Evaluation of the Dirichlet Process Multinomial Mixture Model for Short-Text Topic Modeling

被引:1
|
作者
Karlsson, Alexander [1 ]
Duarte, Denio [2 ]
Mathiason, Gunnar [1 ]
Bae, Juhee [1 ]
机构
[1] Univ Skovde, Sch Informat, Skovde, Sweden
[2] Fed Univ Fronteira Sul, Campus Chapeco, Chapeco, Brazil
关键词
text analysis; topic modeling; Bayesian non-parametrics; Dirichlet process; short text;
D O I
10.1109/ISCBI.2018.00025
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Fast-moving trends, both in society and in highly competitive business areas, call for effective methods for automatic analysis. The availability of fast-moving sources in the form of short texts, such as social media and blogs, allows aggregation from a vast number of text sources, for an up to date view of trends and business insights. Topic modeling is established as an approach for analysis of large amounts of texts, but the scarcity of statistical information in short texts is considered to be a major problem for obtaining reliable topics from traditional models such as LDA. A range of different specialized topic models have been proposed, but a majority of these approaches rely on rather strong parametric assumptions, such as setting a fixed number of topics. In contrast, recent advances in the field of Bayesian non-parametrics suggest the Dirichlet process as a method that, given certain hyper-parameters, can self-adapt to the number of topics of the data at hand. We perform an empirical evaluation of the Dirichlet process multinomial (unigram) mixture model against several parametric topic models, initialized with different number of topics. The resulting models are evaluated, using both direct and indirect measures that have been found to correlate well with human topic rankings. We show that the Dirichlet Process Multinomial Mixture model is a viable option for short text topic modeling since it on average performs better, or nearly as good, compared to the parametric alternatives, while reducing parameter setting requirements and thereby eliminates the need of expensive preprocessing.
引用
收藏
页码:79 / 83
页数:5
相关论文
共 50 条
  • [41] Hierarchical topic modeling with nested hierarchical Dirichlet process
    Yi-qun Ding
    Shan-ping Li
    Zhen Zhang
    Bin Shen
    Journal of Zhejiang University-SCIENCE A, 2009, 10 : 858 - 867
  • [42] Hierarchical topic modeling with nested hierarchical Dirichlet process
    Ding, Yi-qun
    Li, Shan-ping
    Zhang, Zhen
    Shen, Bin
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A, 2009, 10 (06): : 858 - 867
  • [43] Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations
    Shi, Tian
    Kang, Kyeongpil
    Choo, Jaegul
    Reddy, Chandan K.
    WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, : 1105 - 1114
  • [44] Polysemy Needs Attention: Short-Text Topic Discovery With Global and Multi-Sense Information
    Lu, Heng-Yang
    Yang, Jun
    Zhang, Yi
    Li, Zuoyong
    IEEE ACCESS, 2021, 9 : 14918 - 14932
  • [45] Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data
    Laura Anderlucci
    Cinzia Viroli
    Advances in Data Analysis and Classification, 2020, 14 : 759 - 770
  • [46] Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data
    Anderlucci, Laura
    Viroli, Cinzia
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2020, 14 (04) : 759 - 770
  • [47] A Dirichlet Process Mixture of Generalized Dirichlet Distributions for Proportional Data Modeling
    Bouguila, Nizar
    Ziou, Djemel
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (01): : 107 - 122
  • [48] Dirichlet process mixture model based nonparametric Bayesian modeling and variational inference
    Fei, Zhengshun
    Liu, Kangling
    Huang, Bingqiang
    Zheng, Yongping
    Xiang, Xinjian
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 3048 - 3051
  • [49] A Web service clustering method based on topic enhanced Gibbs sampling algorithm for the Dirichlet Multinomial Mixture model and service collaboration graph
    Hu, Qiang
    Shen, Jiaji
    Wang, Kun
    Du, Junwei
    Du, Yuyue
    INFORMATION SCIENCES, 2022, 586 : 239 - 260
  • [50] Self-supervised Short-text Modeling through Auxiliary Context Generation
    Choudhary, Nurendra
    Aggarwal, Charu C.
    Subbian, Karthik
    Reddy, Chandan K.
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (03)