Leveraging external information in topic modelling

被引:0
|
作者
He Zhao
Lan Du
Wray Buntine
Gang Liu
机构
[1] Monash University,Faculty of Information Technology
[2] Harbin Engineering University,College of Computer Science and Technology
来源
关键词
Latent Dirichlet allocation; Side information; Data augmentation; Gibbs sampling;
D O I
暂无
中图分类号
学科分类号
摘要
Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this article, we present a topic model called MetaLDA, which is able to leverage either document or word meta-information, or both of them jointly, in the generative process. With two data augmentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta-information. Extensive experiments on several real-world datasets demonstrate that our model achieves superior performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, our model runs significantly faster than other models using meta-information.
引用
收藏
页码:661 / 693
页数:32
相关论文
共 50 条
  • [41] The Ideal Topic: Interdependence of Topic Interpretability and Other Quality Features in Topic Modelling for Short Texts
    Blekanov, Ivan S.
    Bodrunova, Svetlana S.
    Zhuravleva, Nina
    Smoliarova, Anna
    Tarasov, Nikita
    SOCIAL COMPUTING AND SOCIAL MEDIA. DESIGN, ETHICS, USER BEHAVIOR, AND SOCIAL NETWORK ANALYSIS, SCSM 2020, PT I, 2020, 12194 : 19 - 26
  • [42] Wireless Advances and a Feature Topic on Leveraging Drones for Wireless Services
    Ansari, Nirwan
    IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 2 - 3
  • [43] Information Overload: an evergreen topic
    Tibor, Koltay
    INFORMACIOS TARSADALOM, 2017, 17 (03): : 39 - +
  • [44] Special Topic: Information Fusion
    Zhunga LIU
    Wei XIONG
    Chinese Journal of Aeronautics , 2022, (05) : 3 - 3
  • [45] Information structure and contrastive topic
    Hajicová, E
    Sgall, P
    Veselá, KI
    ANNUAL WORKSHOP ON FORMAL APPROACHES TO SLAVIC LINGUISTICS: THE AMHERST MEETING 2002, 2003, 48 : 219 - 234
  • [46] Back to the future: Topic modelling and beyond
    Jaworska, Sylvia
    DISCOURSE STUDIES, 2024,
  • [47] Modelling topic propagation over the Internet
    Zeng, Jianping
    Zhang, Shiyong
    Wu, Chengrong
    Ji, Xiangwen
    MATHEMATICAL AND COMPUTER MODELLING OF DYNAMICAL SYSTEMS, 2009, 15 (01) : 83 - 93
  • [48] KNOWLEDGE OR INFORMATION - A HOT TOPIC
    SMITH, D
    CANADIAN LIBRARY JOURNAL, 1982, 39 (04): : 265 - 265
  • [49] Special topic: integrating modelling and experimentation
    Matyssek, R.
    Mohren, G. M. J.
    TREES-STRUCTURE AND FUNCTION, 2012, 26 (06): : 1679 - 1682
  • [50] Document Representations to Improve Topic Modelling
    Poojitha, P. Venkata
    Menon, Remya R. K.
    2020 INTERNATIONAL CONFERENCE ON SOFTWARE SECURITY AND ASSURANCE (ICSSA 2020), 2020, : 79 - 83