GPLDA: A Generalized Poisson Latent Dirichlet Topic Model

被引:0
|
作者
Bala, Ibrahim Bakari [1 ]
Saringat, Mohd Zainuri [1 ]
机构
[1] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Johor Baharu, Malaysia
关键词
Bag-of-word; generalized Poisson distribution; topic model; latent Dirichlet allocation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The earliest modification of Latent Dirichlet Allocation (LDA) in terms of words or document attributes is by relaxing its exchangeability assumption via the Bag-of-word (BoW) matrix. Several authors have proposed many modifications of the original LDA by focusing on model that assumes the current topic depends on the words from previous topic. Most of the earlier work ignored the document length distribution since it is assumed that it will fizzle out at the modelling stage. Thus, in this paper, the Poisson document length distribution of LDA model is replaced with Generalized Poisson (GP) distribution which has the strength of capturing complex structures. The main strengths of GP are in capturing overdispersed (variance larger than mean) and under dispersed (variance smaller than mean) count data. The Poisson distribution used by LDA strongly relies on the assumption that the mean and variance of document lengths are equal. This assumption is often unrealistic with most real-life text data where the variance of document length may be greater than or less than their mean. Approximate estimate of the GPLDA model parameters was achieved using Newton-Raphson approximation technique of log-likelihood. Performance and comparative analysis of GPLDA with LDA using accuracy and F-1 showed improved results.
引用
收藏
页码:403 / 407
页数:5
相关论文
共 50 条
  • [41] Fast Moment Estimation for Generalized Latent Dirichlet Models
    Zhao, Shiwen
    Engelhardt, Barbara E.
    Mukherjee, Sayan
    Dunson, David B.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (524) : 1528 - 1540
  • [42] Road Traffic Topic Modeling on Twitter using Latent Dirichlet Allocation
    Hidayatullah, Ahmad Fathan
    Ma'arif, Muhammad Rifqi
    2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 47 - 52
  • [43] An exploration of research trends on metaverse: topic modeling with latent dirichlet allocation
    Park H.
    Ahn B.
    Kim T.
    Quality & Quantity, 2025, 59 (1) : 233 - 252
  • [44] A FRAMEWORK OF URDU TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)
    Shakeel, Khadija
    Tahir, Ghulam Rasool
    Tehseen, Irsha
    Ali, Mubashir
    2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 117 - 123
  • [45] ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation
    Schwarz, Carlo
    STATA JOURNAL, 2018, 18 (01): : 101 - 117
  • [46] Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints
    Bastani, Kaveh
    Namavari, Hamed
    Shaffer, Jeffrey
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 127 : 256 - 271
  • [47] Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
    Jelodar, Hamed
    Wang, Yongli
    Yuan, Chi
    Feng, Xia
    Jiang, Xiahui
    Li, Yanchao
    Zhao, Liang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (11) : 15169 - 15211
  • [48] Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
    Hamed Jelodar
    Yongli Wang
    Chi Yuan
    Xia Feng
    Xiahui Jiang
    Yanchao Li
    Liang Zhao
    Multimedia Tools and Applications, 2019, 78 : 15169 - 15211
  • [49] GENERALIZED POISSON-DIRICHLET DISTRIBUTIONS BASED ON THE DICKMAN SUBORDINATOR
    Maller, R.
    Shemehsavar, S.
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2022, 67 (04) : 593 - 612
  • [50] GENERALIZED POISSON-DIRICHLET DISTRIBUTIONS BASED ON THE DICKMAN SUBORDINATOR
    Maller, R.
    Shemehsavar, S.
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2023, 67 (04) : 593 - 612