Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

被引:0
|
作者
Koochemeshkian, Pantea [1 ]
Bouguila, Nizar [1 ]
机构
[1] Concordia Inst Informat Syst Engn CIISE, Informat Syst Engn, Montreal, PQ, Canada
关键词
DIRICHLET; EXTRACTION;
D O I
10.1080/08839514.2024.2403904
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modeling, a way to find topics in large volumes of text, has grown with the help of deep learning. This paper presents two novel approaches to topic modeling by integrating embeddings derived from Bert-Topic with the multi-grain clustering topic model (MGCTM). Recognizing the inherent hierarchical and multi-scale nature of topics in corpora, our methods utilize MGCTM to capture topic structures at multiple levels of granularity. We enhance the expressiveness of MGCTM by introducing the Generalized Dirichlet and Beta-Liouville distributions as priors, which provide greater flexibility in modeling topic proportions and capturing richer topic relationships. Comprehensive experiments on various datasets showcase the effectiveness of our proposed models in achieving superior topic coherence and granularity compared to state-of-the-art methods. Our findings underscore the potential of leveraging hybrid architectures, marrying neural embeddings with advanced probabilistic modeling, to push the boundaries of topic modeling.
引用
收藏
页数:33
相关论文
共 50 条
  • [11] Efficient integration of generative topic models into discriminative classifiers using robust probabilistic kernels
    Ihou, Koffi Eddy
    Bouguila, Nizar
    Bouachir, Wassim
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (01) : 217 - 241
  • [12] Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings
    Wang, Weixuan
    Peng, Wei
    Zhang, Meng
    Liu, Qun
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3197 - 3202
  • [13] Probabilistic models for topic detection and tracking
    Walls, F
    Jin, H
    Sista, S
    Schwartz, R
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 521 - 524
  • [14] Probabilistic models for topic detection and tracking
    GTE/BBN Technologies, Cambridge, MA, United States
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (521-524):
  • [15] Incorporating Probabilistic Knowledge into Topic Models
    Yao, Liang
    Zhang, Yin
    Wei, Baogang
    Qian, Hongze
    Wang, Yibing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 586 - 597
  • [16] Probabilistic topic models for sequence data
    Nicola Barbieri
    Giuseppe Manco
    Ettore Ritacco
    Marco Carnuccio
    Antonio Bevacqua
    Machine Learning, 2013, 93 : 5 - 29
  • [17] Probabilistic topic models for sequence data
    Barbieri, Nicola
    Manco, Giuseppe
    Ritacco, Ettore
    Carnuccio, Marco
    Bevacqua, Antonio
    MACHINE LEARNING, 2013, 93 (01) : 5 - 29
  • [18] Document Clustering Meets Topic Modeling with Word Embeddings
    Costa, Gianni
    Ortale, Riccardo
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 244 - 252
  • [19] Incorporating word embeddings into topic modeling of short text
    Gao, Wang
    Peng, Min
    Wang, Hua
    Zhang, Yanchun
    Xie, Qianqian
    Tian, Gang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (02) : 1123 - 1145
  • [20] Query Recommendation Using Topic Modeling and Word Embeddings
    Duan, Jianyong
    Song, Yadi
    Zhang, Yongmei
    Wu, Mingli
    Wang, Hao
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION (AIPR 2018), 2018, : 86 - 92