Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

被引:6
|
作者
Magnusson, Mans [1 ]
Jonsson, Leif [1 ,2 ]
Villani, Mattias [1 ]
Broman, David [3 ]
机构
[1] Linkoping Univ, Dept Comp & Informat Sci, S-58183 Linkoping, Sweden
[2] Ericsson AB, Stockholm, Sweden
[3] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
关键词
Bayesian inference; Computational complexity; Gibbs sampling; Latent Dirichlet allocation; Massive datasets; Parallel computing;
D O I
10.1080/10618600.2017.1366913
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Topic models, and more specifically the class of latent Dirichlet allocation (LDA), are widely used for probabilistic modeling of text. Markov chain Monte Carlo (MCMC) sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler. Supplementary materials for this article are available online.
引用
下载
收藏
页码:449 / 463
页数:15
相关论文
共 50 条
  • [1] Scalable Collapsed Inference for High-Dimensional Topic Models
    Islam, Rashidul
    Foulds, James
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2836 - 2845
  • [2] Distributed, partially collapsed MCMC for Bayesian nonparametrics
    Dubeyu, Avinava
    Zhangu, Michael M.
    Xing, Eric P.
    Williamson, Sinead A.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3685 - 3694
  • [3] Partially collapsed parallel Gibbs sampler for Dirichlet process mixture models
    Yerebakan, Halid Ziya
    Dundar, Murat
    PATTERN RECOGNITION LETTERS, 2017, 90 : 22 - 27
  • [4] Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models
    Terenin, Alexander
    Magnusson, Mans
    Jonsson, Leif
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2925 - 2934
  • [5] Inference in MCMC step selection models
    Michelot, Theo
    Blackwell, Paul G.
    Chamaille-Jammes, Simon
    Matthiopoulos, Jason
    BIOMETRICS, 2020, 76 (02) : 438 - 447
  • [6] Stochastic Collapsed Variational Bayesian Inference for Biterm Topic Model
    Awaya, Narutaka
    Kitazono, Jun
    Omori, Toshiaki
    Ozawa, Seiichi
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3364 - 3370
  • [7] Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models
    Burkhardt, Sophie
    Kramer, Stefan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II, 2017, 10535 : 189 - 204
  • [8] A PARTIALLY COLLAPSED GIBBS SAMPLER FOR UNSUPERVISED NONNEGATIVE SPARSE SIGNAL RESTORATION
    Amrouche, M. C.
    Carfantan, H.
    Idier, J.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5519 - 5523
  • [9] Interpretation and inference in mixture models: Simple MCMC works
    Geweke, John
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (07) : 3529 - 3550
  • [10] Parallel Local Approximation MCMC for Expensive Models
    Conrad, Patrick R.
    Davis, Andrew D.
    Marzouk, Youssef M.
    Pillai, Natesh S.
    Smith, Aaron
    SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION, 2018, 6 (01): : 339 - 373