GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model

被引:0
|
作者
Zhu, Qile [1 ]
Feng, Zheng [1 ]
Li, Xiaolin [1 ]
机构
[1] Univ Florida, NSF Ctr Big Learning, Large Scale Intelligent Syst Lab, Gainesville, FL 32611 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Discovering the latent topics within texts has been a fundamental task for many applications. However, conventional topic models suffer different problems in different settings. The Latent Dirichlet Allocation (LDA) may not work well for short texts due to the data sparsity (i.e., the sparse word co-occurrence patterns in short documents). The Biterm Topic Model (BTM) learns topics by modeling the word-pairs named biterms in the whole corpus. This assumption is very strong when documents are long with rich topic information and do not exhibit the transitivity of biterms. In this paper, we propose a novel way called GraphBTM to represent biterms as graphs and design Graph Convolutional Networks (GCNs) with residual connections to extract transitive features from biterms. To overcome the data sparsity of LDA and the strong assumption of BTM, we sample a fixed number of documents to form a mini-corpus as a training instance. We also propose a dataset called All News extracted from (Thompson, 2017), in which documents are much longer than 20 Newsgroups. We present an amortized variational inference method for GraphBTM. Our method generates more coherent topics compared with previous approaches. Experiments show that the sampling strategy improves performance by a large margin.
引用
收藏
页码:4663 / 4672
页数:10
相关论文
共 50 条
  • [1] Stochastic Collapsed Variational Bayesian Inference for Biterm Topic Model
    Awaya, Narutaka
    Kitazono, Jun
    Omori, Toshiaki
    Ozawa, Seiichi
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3364 - 3370
  • [2] Two time-efficient gibbs sampling inference algorithms for biterm topic model
    Zhou, Xiaotang
    Ouyang, Jihong
    Li, Ximing
    APPLIED INTELLIGENCE, 2018, 48 (03) : 730 - 754
  • [3] Two time-efficient gibbs sampling inference algorithms for biterm topic model
    Xiaotang Zhou
    Jihong Ouyang
    Ximing Li
    Applied Intelligence, 2018, 48 : 730 - 754
  • [4] Sparse Biterm Topic Model for Short Texts
    Zhu, Bingshan
    Cai, Yi
    Zhang, Huakui
    WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 227 - 241
  • [5] User Based Aggregation for Biterm Topic Model
    Chen, Weizheng
    Wang, Jinpeng
    Zhang, Yan
    Yan, Hongfei
    Li, Xiaoming
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 489 - 494
  • [6] A Biterm Topic Model for Sparse Mutation Data
    Sason, Itay
    Chen, Yuexi
    Leiserson, Mark D. M.
    Sharan, Roded
    CANCERS, 2023, 15 (05)
  • [7] Improving biterm topic model with word embeddings
    Jiajia Huang
    Min Peng
    Pengwei Li
    Zhiwei Hu
    Chao Xu
    World Wide Web, 2020, 23 : 3099 - 3124
  • [8] Improving biterm topic model with word embeddings
    Huang, Jiajia
    Peng, Min
    Li, Pengwei
    Hu, Zhiwei
    Xu, Chao
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (06): : 3099 - 3124
  • [9] Reweighted autoencoded variational Bayes for enhanced sampling (RAVE)
    Ribeiro, Joao Marcelo Lamim
    Bravo, Pablo
    Wang, Yihang
    Tiwary, Pratyush
    JOURNAL OF CHEMICAL PHYSICS, 2018, 149 (07):
  • [10] FastBTM: Reducing the sampling time for biterm topic model
    He, Xingwei
    Xu, Hua
    Li, Jia
    He, Liu
    Yu, Linlin
    KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 11 - 20