GPLDA: A Generalized Poisson Latent Dirichlet Topic Model

被引:0
|
作者
Bala, Ibrahim Bakari [1 ]
Saringat, Mohd Zainuri [1 ]
机构
[1] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Johor Baharu, Malaysia
关键词
Bag-of-word; generalized Poisson distribution; topic model; latent Dirichlet allocation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The earliest modification of Latent Dirichlet Allocation (LDA) in terms of words or document attributes is by relaxing its exchangeability assumption via the Bag-of-word (BoW) matrix. Several authors have proposed many modifications of the original LDA by focusing on model that assumes the current topic depends on the words from previous topic. Most of the earlier work ignored the document length distribution since it is assumed that it will fizzle out at the modelling stage. Thus, in this paper, the Poisson document length distribution of LDA model is replaced with Generalized Poisson (GP) distribution which has the strength of capturing complex structures. The main strengths of GP are in capturing overdispersed (variance larger than mean) and under dispersed (variance smaller than mean) count data. The Poisson distribution used by LDA strongly relies on the assumption that the mean and variance of document lengths are equal. This assumption is often unrealistic with most real-life text data where the variance of document length may be greater than or less than their mean. Approximate estimate of the GPLDA model parameters was achieved using Newton-Raphson approximation technique of log-likelihood. Performance and comparative analysis of GPLDA with LDA using accuracy and F-1 showed improved results.
引用
收藏
页码:403 / 407
页数:5
相关论文
共 50 条
  • [31] Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic Model
    Ihou, Koffi Eddy
    Amayri, Manar
    Bouguila, Nizar
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (05)
  • [32] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [33] Latent Dirichlet mixture model
    Chien, Jen-Tzung
    Lee, Chao-Hsi
    Tan, Zheng-Hua
    NEUROCOMPUTING, 2018, 278 : 12 - 22
  • [34] A generalized Dirichlet model
    Thomas, Seemon
    Jacob, Joy
    STATISTICS & PROBABILITY LETTERS, 2006, 76 (16) : 1761 - 1767
  • [35] Dirichlet Problems for the Generalized n-Poisson Equation
    Aksoy, U.
    Celebi, A. O.
    PSEUDO-DIFFERENTIAL OPERATORS: COMPLEX ANALYSIS AND PARTIAL DIFFERENTIAL EQUATIONS, 2010, 205 : 129 - +
  • [36] Variational-based latent generalized Dirichlet allocation model in the collapsed space and applications
    Ihou, Koffi Eddy
    Bouguila, Nizar
    NEUROCOMPUTING, 2019, 332 : 372 - 395
  • [37] Parallel inference for cross-collection latent generalized Dirichlet allocation model and applications
    Luo, Zhiwen
    Amayri, Manar
    Fan, Wentao
    Ihou, Koffi Eddy
    Bouguila, Nizar
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [38] Topic Modeling of Online Accommodation Reviews via Latent Dirichlet Allocation
    Sutherland, Ian
    Sim, Youngseok
    Lee, Seul Ki
    Byun, Jaemun
    Kiatkawsin, Kiattipoom
    SUSTAINABILITY, 2020, 12 (05) : 1 - 15
  • [39] A Topic Model Based on Poisson Decomposition
    Jiang, Haixin
    Zhou, Rui
    Zhang, Limeng
    Wang, Hua
    Zhang, Yanchun
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1489 - 1498
  • [40] Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation
    Bolelli, Levent
    Ertekin, Seyda
    Giles, C. Lee
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 776 - +