Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

被引:0
|
作者
Koochemeshkian, Pantea [1 ]
Bouguila, Nizar [1 ]
机构
[1] Concordia Inst Informat Syst Engn CIISE, Informat Syst Engn, Montreal, PQ, Canada
关键词
DIRICHLET; EXTRACTION;
D O I
10.1080/08839514.2024.2403904
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modeling, a way to find topics in large volumes of text, has grown with the help of deep learning. This paper presents two novel approaches to topic modeling by integrating embeddings derived from Bert-Topic with the multi-grain clustering topic model (MGCTM). Recognizing the inherent hierarchical and multi-scale nature of topics in corpora, our methods utilize MGCTM to capture topic structures at multiple levels of granularity. We enhance the expressiveness of MGCTM by introducing the Generalized Dirichlet and Beta-Liouville distributions as priors, which provide greater flexibility in modeling topic proportions and capturing richer topic relationships. Comprehensive experiments on various datasets showcase the effectiveness of our proposed models in achieving superior topic coherence and granularity compared to state-of-the-art methods. Our findings underscore the potential of leveraging hybrid architectures, marrying neural embeddings with advanced probabilistic modeling, to push the boundaries of topic modeling.
引用
收藏
页数:33
相关论文
共 50 条
  • [31] Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models
    Foulds, James
    Kumar, Shachi H.
    Getoor, Lise
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 777 - 786
  • [32] Adding Thesaurus Information into Probabilistic Topic Models
    Loukachevitch, Natalia
    Nokel, Michael
    TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 210 - 218
  • [33] Combining Thesaurus Knowledge and Probabilistic Topic Models
    Loukachevitch, Natalia
    Nokel, Michael
    Ivanov, Kirill
    ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2017, 2018, 10716 : 59 - 71
  • [34] Probabilistic Explicit Topic Modeling Using Wikipedia
    Hansen, Joshua A.
    Ringger, Eric K.
    Seppi, Kevin D.
    LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 69 - 82
  • [35] Probabilistic Word Selection via Topic Modeling
    Zhuang, Yueting
    Gao, Haidong
    Wu, Fei
    Tang, Siliang
    Zhang, Yin
    Zhang, Zhongfei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (06) : 1643 - 1655
  • [36] Contextual Word Embeddings and Topic Modeling in Healthy Dieting and Obesity
    Yeruva, Vijaya Kumari
    Junaid, Sidrah
    Lee, Yugyung
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2019, 3 (02) : 159 - 183
  • [37] Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings
    Li, Chenliang
    Duan, Yu
    Wang, Haoran
    Zhang, Zhiqian
    Sun, Aixin
    Ma, Zongyang
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (02)
  • [38] Probabilistic Topic Models for Learning Terminological Ontologies
    Wei, Wang
    Barnaghi, Payam
    Bargiela, Andrzej
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (07) : 1028 - 1040
  • [39] ACOUSTIC MODELING WITH NEURAL GRAPH EMBEDDINGS
    Liu, Yuzong
    Kirchhoff, Katrin
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 581 - 588
  • [40] Combining Knowledge Graph and Word Embeddings for Spherical Topic Modeling
    Ennajari, Hafsa
    Bouguila, Nizar
    Bentahar, Jamal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (07) : 3609 - 3623