Leveraging external information in topic modelling

被引:0
|
作者
He Zhao
Lan Du
Wray Buntine
Gang Liu
机构
[1] Monash University,Faculty of Information Technology
[2] Harbin Engineering University,College of Computer Science and Technology
来源
关键词
Latent Dirichlet allocation; Side information; Data augmentation; Gibbs sampling;
D O I
暂无
中图分类号
学科分类号
摘要
Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this article, we present a topic model called MetaLDA, which is able to leverage either document or word meta-information, or both of them jointly, in the generative process. With two data augmentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta-information. Extensive experiments on several real-world datasets demonstrate that our model achieves superior performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, our model runs significantly faster than other models using meta-information.
引用
收藏
页码:661 / 693
页数:32
相关论文
共 50 条
  • [1] Leveraging external information in topic modelling
    Zhao, He
    Du, Lan
    Buntine, Wray
    Liu, Gang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (02) : 661 - 693
  • [2] Leveraging Unstructured Information Using Topic Modelling
    Uys, J. W.
    du Preez, N. D.
    Uys, E. W.
    2008 PORTLAND INTERNATIONAL CONFERENCE ON MANAGEMENT OF ENGINEERING & TECHNOLOGY, VOLS 1-5, 2008, : 955 - 961
  • [3] Leveraging External Knowledge for Phrase-based Topic Modeling
    Xu, Mingyang
    Yang, Ruixin
    Ranshous, Stephen
    Li, Shijie
    Samatova, Nagiza F.
    2017 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2017, : 29 - 32
  • [4] Topic Modelling in the Information Warfare Domain
    de Waal, Alta
    Mouton, Francois
    2013 5TH INTERNATIONAL CONFERENCE ON ADAPTIVE SCIENCE AND TECHNOLOGY (ICAST 2013), 2013,
  • [5] Topic modelling of the Information Society Happy Birthday, Information Society!
    Fanni, Mate
    Eszter, Katona
    Arpad, Knap
    Mihaly, Csoto
    INFORMACIOS TARSADALOM, 2021, 21 (01): : 10 - 47
  • [6] Topic Based Information Diffusion Prediction Model with External Trends
    Wu, Di
    Li, Chunping
    Lau, Raymond Y. K.
    2015 IEEE 12TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE), 2015, : 29 - 36
  • [7] Weakly-Supervised Opinion Summarization by Leveraging External Information
    Zhao, Chao
    Chaturvedi, Snigdha
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9644 - 9651
  • [8] Topic Sensitive Information Diffusion Modelling in Online Social Networks
    Michelle, Gracia G.
    Kumaran, P.
    Chitrakala, S.
    PROCEEDINGS OF THE 2016 IEEE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL & ELECTRONICS, INFORMATION, COMMUNICATION & BIO INFORMATICS (IEEE AEEICB-2016), 2016, : 152 - 156
  • [9] Can Topic Modelling benefit from Word Sense Information?
    Ferrugento, Adriana
    Oliveira, Hugo Goncalo
    Alves, Ana Oliveira
    Rodrigues, Filipe
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3387 - 3393
  • [10] External information enhancing topic model based on graph neural network
    Song, Jie
    Lu, Xiaoling
    Hong, Jingya
    Wang, Feifei
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 263