Gaussian hierarchical latent Dirichlet allocation: Bringing polysemy back

被引:4
|
作者
Yoshida, Takahiro [1 ]
Hisano, Ryohei [2 ,4 ]
Ohnishi, Takaaki [3 ]
机构
[1] Canon Inst Global Studies, Tokyo, Japan
[2] Grad Sch Informat Sci & Technol, Tokyo, Japan
[3] Rikkyo Univ, Grad Sch Artificial Intelligence & Sci, Tokyo, Japan
[4] 7-3-1 Hongo,Bunkyo Ku, Tokyo, Japan
来源
PLOS ONE | 2023年 / 18卷 / 07期
关键词
MODELS;
D O I
10.1371/journal.pone.0288274
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Topic models are widely used to discover the latent representation of a set of documents. The two canonical models are latent Dirichlet allocation, and Gaussian latent Dirichlet allocation, where the former uses multinomial distributions over words, and the latter uses multivariate Gaussian distributions over pre-trained word embedding vectors as the latent topic representations, respectively. Compared with latent Dirichlet allocation, Gaussian latent Dirichlet allocation is limited in the sense that it does not capture the polysemy of a word such as "bank." In this paper, we show that Gaussian latent Dirichlet allocation could recover the ability to capture polysemy by introducing a hierarchical structure in the set of topics that the model can use to represent a given document. Our Gaussian hierarchical latent Dirichlet allocation significantly improves polysemy detection compared with Gaussian-based models and provides more parsimonious topic representations compared with hierarchical latent Dirichlet allocation. Our extensive quantitative experiments show that our model also achieves better topic coherence and held-out document predictive accuracy over a wide range of corpus and word embedding vectors which significantly improves the capture of polysemy compared with GLDA and CGTM. Our model learns the underlying topic distribution and hierarchical structure among topics simultaneously, which can be further used to understand the correlation among topics. Moreover, the added flexibility of our model does not necessarily increase the time complexity compared with GLDA and CGTM, which makes our model a good competitor to GLDA.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Discovering Latent Topics by Gaussian Latent Dirichlet Allocation and Spectral Clustering
    Yuan, Bo
    Gao, Xinbo
    Niu, Zhenxing
    Tian, Qi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [2] Formation of Hierarchical Object Concept Using Hierarchical Latent Dirichlet Allocation
    Ando, Yoshiki
    Nakamura, Tomoaki
    Araki, Takaya
    Nagai, Takayuki
    2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 2272 - 2279
  • [3] Image hierarchical representations models based on latent dirichlet allocation
    Wang, Fushun
    Li, Yan
    Sun, Xiaohua
    Cai, Zhenjiang
    Journal of Multimedia, 2013, 8 (04): : 358 - 364
  • [4] A Hierarchical Model Based on Latent Dirichlet Allocation for Action Recognition
    Yang, Shuang
    Yuan, Chunfeng
    Hu, Weiming
    Ding, Xinmiao
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2613 - 2618
  • [5] HIERARCHICAL LATENT DIRICHLET ALLOCATION MODELS FOR REALISTIC ACTION RECOGNITION
    Li, Heping
    Liu, Jie
    Zhang, Shuwu
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 1297 - 1300
  • [6] Comparing Hierarchical Dirichlet Process with Latent Dirichlet Allocation in Bug Report Multiclass Classification
    Limsettho, Nachai
    Hata, Hideaki
    Matsumoto, Ken-ichi
    2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2014, : 137 - 142
  • [7] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [9] AUGMENTED LATENT DIRICHLET ALLOCATION (LDA) TOPIC MODEL WITH GAUSSIAN MIXTURE TOPICS
    Prabhudesai, Kedar S.
    Mainsah, Boyla O.
    Collins, Leslie M.
    Throckmorton, Chandra S.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2451 - 2455
  • [10] Sequential latent Dirichlet allocation
    Du, Lan
    Buntine, Wray
    Jin, Huidong
    Chen, Changyou
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (03) : 475 - 503