Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

被引:0
|
作者
Koochemeshkian, Pantea [1 ]
Bouguila, Nizar [1 ]
机构
[1] Concordia Inst Informat Syst Engn CIISE, Informat Syst Engn, Montreal, PQ, Canada
关键词
DIRICHLET; EXTRACTION;
D O I
10.1080/08839514.2024.2403904
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modeling, a way to find topics in large volumes of text, has grown with the help of deep learning. This paper presents two novel approaches to topic modeling by integrating embeddings derived from Bert-Topic with the multi-grain clustering topic model (MGCTM). Recognizing the inherent hierarchical and multi-scale nature of topics in corpora, our methods utilize MGCTM to capture topic structures at multiple levels of granularity. We enhance the expressiveness of MGCTM by introducing the Generalized Dirichlet and Beta-Liouville distributions as priors, which provide greater flexibility in modeling topic proportions and capturing richer topic relationships. Comprehensive experiments on various datasets showcase the effectiveness of our proposed models in achieving superior topic coherence and granularity compared to state-of-the-art methods. Our findings underscore the potential of leveraging hybrid architectures, marrying neural embeddings with advanced probabilistic modeling, to push the boundaries of topic modeling.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Interpretable Probabilistic Embeddings: Bridging the Gap Between Topic Models and Neural Networks
    Potapenko, Anna
    Popov, Artem
    Vorontsov, Konstantin
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 167 - 180
  • [2] Probabilistic Topic Models
    Blei, David
    Carin, Lawrence
    Dunson, David
    IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (06) : 55 - 65
  • [3] Probabilistic Topic Models
    Blei, David M.
    COMMUNICATIONS OF THE ACM, 2012, 55 (04) : 77 - 84
  • [4] Nonparametric Spherical Topic Modeling with Word Embeddings
    Batmanghelich, Nematollah Kayhan
    Saeedi, Ardavan
    Narasimhan, Karthik R.
    Gershman, Samuel J.
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 537 - 542
  • [5] Gaussian LDA for Topic Models with Word Embeddings
    Das, Rajarshi
    Zaheer, Manzil
    Dyer, Chris
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 795 - 804
  • [6] Probabilistic Relational Supervised Topic Modelling using Word Embeddings
    Al-Ani, Jabir Alshehabi
    Fasli, Maria
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2035 - 2043
  • [7] Topic Modeling Enhancement using Word Embeddings
    Limwattana, Siriwat
    Prom-on, Santitham
    2021 18TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE-2021), 2021,
  • [8] Incorporating Knowledge Graph Embeddings into Topic Modeling
    Yao, Liang
    Zhang, Yin
    Wei, Baogang
    Jin, Zhe
    Zhang, Rui
    Zhang, Yangyang
    Chen, Qinfei
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3119 - 3126
  • [9] A hybrid language model based on a recurrent neural network and probabilistic topic modeling
    Kudinov M.S.
    Romanenko A.A.
    Pattern Recognition and Image Analysis, 2016, 26 (3) : 587 - 592
  • [10] Efficient integration of generative topic models into discriminative classifiers using robust probabilistic kernels
    Koffi Eddy Ihou
    Nizar Bouguila
    Wassim Bouachir
    Pattern Analysis and Applications, 2021, 24 : 217 - 241