GPLDA: A Generalized Poisson Latent Dirichlet Topic Model

被引:0
|
作者
Bala, Ibrahim Bakari [1 ]
Saringat, Mohd Zainuri [1 ]
机构
[1] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Johor Baharu, Malaysia
关键词
Bag-of-word; generalized Poisson distribution; topic model; latent Dirichlet allocation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The earliest modification of Latent Dirichlet Allocation (LDA) in terms of words or document attributes is by relaxing its exchangeability assumption via the Bag-of-word (BoW) matrix. Several authors have proposed many modifications of the original LDA by focusing on model that assumes the current topic depends on the words from previous topic. Most of the earlier work ignored the document length distribution since it is assumed that it will fizzle out at the modelling stage. Thus, in this paper, the Poisson document length distribution of LDA model is replaced with Generalized Poisson (GP) distribution which has the strength of capturing complex structures. The main strengths of GP are in capturing overdispersed (variance larger than mean) and under dispersed (variance smaller than mean) count data. The Poisson distribution used by LDA strongly relies on the assumption that the mean and variance of document lengths are equal. This assumption is often unrealistic with most real-life text data where the variance of document length may be greater than or less than their mean. Approximate estimate of the GPLDA model parameters was achieved using Newton-Raphson approximation technique of log-likelihood. Performance and comparative analysis of GPLDA with LDA using accuracy and F-1 showed improved results.
引用
收藏
页码:403 / 407
页数:5
相关论文
共 50 条
  • [21] Topic modeling for expert finding using latent Dirichlet allocation
    Momtazi, Saeedeh
    Naumann, Felix
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 346 - 353
  • [22] Approaches to improve preprocessing for Latent Dirichlet Allocation topic modeling
    Zimmermann, Jamie
    Champagne, Lance E.
    Dickens, John M.
    Hazen, Benjamin T.
    DECISION SUPPORT SYSTEMS, 2024, 185
  • [23] Analysis of the impact of investor sentiment on stock price using the latent dirichlet allocation topic model
    Chen, Meilan
    Guo, Zhiying
    Abbass, Kashif
    Huang, Wenfeng
    FRONTIERS IN ENVIRONMENTAL SCIENCE, 2022, 10
  • [24] Topic modeling with latent Dirichlet allocation for cancer disease posts
    Altintas, Volkan
    Albayrak, Mehmet
    Topal, Kamil
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2021, 36 (04): : 2183 - 2196
  • [25] An Improved Latent Dirichlet Allocation Method for Service Topic Detection
    Guo Lantian
    Li Zhe
    Yang Tao
    Zhang Huixiang
    Mu Dejun
    Li Yang
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7045 - 7049
  • [26] Context-Aware Latent Dirichlet Allocation for Topic Segmentation
    Li, Wenbo
    Matsukawa, Tetsu
    Saigo, Hiroto
    Suzuki, Einoshin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 475 - 486
  • [27] Intelligent radar software defect classification approach based on the latent Dirichlet allocation topic model
    Liu, Xi
    Yin, Yongfeng
    Li, Haifeng
    Chen, Jiabin
    Liu, Chang
    Wang, Shengli
    Yin, Rui
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2021, 2021 (01)
  • [28] Topic Modelling Twitter Data with Latent Dirichlet Allocation Method
    Negara, Edi Surya
    Triadi, Dendi
    Andryani, Ria
    2019 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (ICECOS 2019), 2019, : 386 - 390
  • [29] Constrained Latent Dirichlet Allocation for Subgroup Discovery with Topic Rules
    Li, Rui
    Ahmadi, Zahra
    Kramer, Stefan
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 519 - +
  • [30] Intelligent radar software defect classification approach based on the latent Dirichlet allocation topic model
    Xi Liu
    Yongfeng Yin
    Haifeng Li
    Jiabin Chen
    Chang Liu
    Shengli Wang
    Rui Yin
    EURASIP Journal on Advances in Signal Processing, 2021