Leveraging external information in topic modelling

被引:0
|
作者
He Zhao
Lan Du
Wray Buntine
Gang Liu
机构
[1] Monash University,Faculty of Information Technology
[2] Harbin Engineering University,College of Computer Science and Technology
来源
关键词
Latent Dirichlet allocation; Side information; Data augmentation; Gibbs sampling;
D O I
暂无
中图分类号
学科分类号
摘要
Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this article, we present a topic model called MetaLDA, which is able to leverage either document or word meta-information, or both of them jointly, in the generative process. With two data augmentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta-information. Extensive experiments on several real-world datasets demonstrate that our model achieves superior performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, our model runs significantly faster than other models using meta-information.
引用
收藏
页码:661 / 693
页数:32
相关论文
共 50 条
  • [21] Leveraging Locality for Topic Identification of Conversational Speech
    Wintrode, Jonathan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1578 - 1582
  • [22] Leveraging spiking neural networks for topic modeling
    Bialas, Marcin
    Mironczuk, Marcin Michal
    Mandziuk, Jacek
    NEURAL NETWORKS, 2024, 178
  • [23] Leveraging Social Context for Modeling Topic Evolution
    Kalyanam, Janani
    Mantrach, Amin
    Saez-Trumper, Diego
    Vahabi, Hossein
    Lanckriet, Gert
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 517 - 526
  • [24] Topic modelling is a means to an end: On topic modelling in corpus linguistics and discourse analysis
    Taboada, Maite
    DISCOURSE STUDIES, 2024,
  • [25] Bibliometric impact measures leveraging topic analysis
    Mann, Gideon S.
    Mimno, David
    McCallum, Andrew
    OPENING INFORMATION HORIZONS, 2006, : 65 - +
  • [26] Unsupervised Extractive News Articles Summarization leveraging Statistical, Topic-Modelling and Graph-based Approaches
    Barman, Utpal
    Barman, Vishal
    Choudhury, Nawaz Khan
    Rahman, Mustafizur
    Sarma, Shikhar Kumar
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2022, 81 (09): : 952 - 962
  • [27] External possession and topic binding
    Franco, J
    Landa, A
    STRUCTURE, MEANING, AND ACQUISITION IN SPANISH, 2002, : 147 - 164
  • [28] Research knowledge utilisation for societal impact: Information practices based on abductive topic modelling
    Zheng, Han
    Pee, L. G.
    JOURNAL OF INFORMATION SCIENCE, 2024, 50 (01) : 129 - 144
  • [29] A topic of information and violence
    Lara Klahr, Marco
    CHASQUI-REVISTA LATINOAMERICANA DE COMUNICACION, 2010, (110): : 40 - 45
  • [30] Leveraging Social Annotation for Topic Language Model Adaptation
    Wu, Youzheng
    Abe, Kazuhiko
    Dixon, Paul
    Hori, Chiori
    Kashioka, Hideki
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 190 - 193