GPLDA: A Generalized Poisson Latent Dirichlet Topic Model

被引:0
|
作者
Bala, Ibrahim Bakari [1 ]
Saringat, Mohd Zainuri [1 ]
机构
[1] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Johor Baharu, Malaysia
关键词
Bag-of-word; generalized Poisson distribution; topic model; latent Dirichlet allocation;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The earliest modification of Latent Dirichlet Allocation (LDA) in terms of words or document attributes is by relaxing its exchangeability assumption via the Bag-of-word (BoW) matrix. Several authors have proposed many modifications of the original LDA by focusing on model that assumes the current topic depends on the words from previous topic. Most of the earlier work ignored the document length distribution since it is assumed that it will fizzle out at the modelling stage. Thus, in this paper, the Poisson document length distribution of LDA model is replaced with Generalized Poisson (GP) distribution which has the strength of capturing complex structures. The main strengths of GP are in capturing overdispersed (variance larger than mean) and under dispersed (variance smaller than mean) count data. The Poisson distribution used by LDA strongly relies on the assumption that the mean and variance of document lengths are equal. This assumption is often unrealistic with most real-life text data where the variance of document length may be greater than or less than their mean. Approximate estimate of the GPLDA model parameters was achieved using Newton-Raphson approximation technique of log-likelihood. Performance and comparative analysis of GPLDA with LDA using accuracy and F-1 showed improved results.
引用
收藏
页码:403 / 407
页数:5
相关论文
共 50 条
  • [1] An Improved Latent Dirichlet Allocation Model for Hot Topic Extraction
    Liu, Guolong
    Xu, Xiaofei
    Zhu, Ying
    Li, Li
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 470 - 476
  • [2] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [3] Topic Model Allocation of Conversational Dialogue Records by Latent Dirichlet Allocation
    Yeh, Jui-Feng
    Lee, Chen-Hsien
    Tan, Yi-Shiuan
    Yu, Liang-Chih
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [4] iLDA: An interactive latent Dirichlet allocation model to improve topic quality
    Liu, Yezheng
    Du, Fei
    Sun, Jianshan
    Jiang, Yuanchun
    JOURNAL OF INFORMATION SCIENCE, 2020, 46 (01) : 23 - 40
  • [5] Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation
    Jeon, Hyung-Bae
    Lee, Soo-Young
    ETRI JOURNAL, 2016, 38 (03) : 487 - 493
  • [6] A Smoothed Latent Generalized Dirichlet Allocation Model in the Collapsed Space
    Ihou, Koffi Eddy
    Bouguila, Nizar
    2018 IEEE 61ST INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2018, : 877 - 880
  • [7] A New Latent generalized Dirichlet Allocation Model for Image Classification
    Ihou, Koffi Eddy
    Bouguila, Nizar
    PROCEEDINGS OF THE 2017 SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA 2017), 2017,
  • [8] AUGMENTED LATENT DIRICHLET ALLOCATION (LDA) TOPIC MODEL WITH GAUSSIAN MIXTURE TOPICS
    Prabhudesai, Kedar S.
    Mainsah, Boyla O.
    Collins, Leslie M.
    Throckmorton, Chandra S.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2451 - 2455
  • [9] Research Topic Analysis in Engineering Management Using a Latent Dirichlet Allocation Model
    Kim, Jin Ho
    Chen, Weiru
    JOURNAL OF INDUSTRIAL INTEGRATION AND MANAGEMENT-INNOVATION AND ENTREPRENEURSHIP, 2018, 3 (04):
  • [10] A segmented topic model based on the two-parameter Poisson-Dirichlet process
    Du, Lan
    Buntine, Wray
    Jin, Huidong
    MACHINE LEARNING, 2010, 81 (01) : 5 - 19