Leveraging external information in topic modelling

被引:0
|
作者
He Zhao
Lan Du
Wray Buntine
Gang Liu
机构
[1] Monash University,Faculty of Information Technology
[2] Harbin Engineering University,College of Computer Science and Technology
来源
关键词
Latent Dirichlet allocation; Side information; Data augmentation; Gibbs sampling;
D O I
暂无
中图分类号
学科分类号
摘要
Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this article, we present a topic model called MetaLDA, which is able to leverage either document or word meta-information, or both of them jointly, in the generative process. With two data augmentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta-information. Extensive experiments on several real-world datasets demonstrate that our model achieves superior performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, our model runs significantly faster than other models using meta-information.
引用
收藏
页码:661 / 693
页数:32
相关论文
共 50 条
  • [31] On Leveraging User Access Patterns for Topic Specific Crawling
    Charu C. Aggarwal
    Data Mining and Knowledge Discovery, 2004, 9 : 123 - 145
  • [32] On leveraging user access patterns for topic specific crawling
    Aggarwal, CC
    DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 9 (02) : 123 - 145
  • [33] Leveraging Multiple Representations of Topic Models for Knowledge Discovery
    Potts, Colin M.
    Savaliya, Akshat
    Jhala, Arnav
    IEEE ACCESS, 2022, 10 : 104696 - 104705
  • [34] Leveraging production information
    不详
    CHEMICAL ENGINEERING PROGRESS, 2004, 100 (09) : 19 - 19
  • [35] Topic modelling for qualitative studies
    Nikolenko, Sergey I.
    Koltcov, Sergei
    Koltsova, Olessia
    JOURNAL OF INFORMATION SCIENCE, 2017, 43 (01) : 88 - 102
  • [36] Topic research in fuzzy domain: Based on LDA topic modelling
    Yu, Dejian
    Fang, Anran
    Xu, Zeshui
    INFORMATION SCIENCES, 2023, 648
  • [37] Synchronizing topic maps with external sources
    Garshol, Lars Marius
    Leveraging the Semantics of Topics Maps, 2007, 4438 : 192 - 199
  • [38] Modelling multi-topic information propagation in online social networks based on resource competition
    Sun, Liyuan
    Zhou, Yadong
    Guan, Xiaohong
    JOURNAL OF INFORMATION SCIENCE, 2017, 43 (03) : 342 - 355
  • [39] Proactive self-exploration: Leveraging information sharing and predictive modelling for anticipating and countering adversaries
    Hashmi, Saad Sajid
    Dam, Hoa Khanh
    Chhetri, Mohan Baruwal
    Uzunov, Anton, V
    Colman, Alan
    Vo, Quoc Bao
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 267
  • [40] Leveraging Entity Recognition for Automotive Customer Feedback Topic Modeling
    Weber, Lukas Jonathan
    Ramalingam, Krishnan Jothi
    Liu, Chin
    Beyer, Matthias
    Zimmermann, Axel
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 795 - 799