Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

被引:6
|
作者
Magnusson, Mans [1 ]
Jonsson, Leif [1 ,2 ]
Villani, Mattias [1 ]
Broman, David [3 ]
机构
[1] Linkoping Univ, Dept Comp & Informat Sci, S-58183 Linkoping, Sweden
[2] Ericsson AB, Stockholm, Sweden
[3] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
关键词
Bayesian inference; Computational complexity; Gibbs sampling; Latent Dirichlet allocation; Massive datasets; Parallel computing;
D O I
10.1080/10618600.2017.1366913
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Topic models, and more specifically the class of latent Dirichlet allocation (LDA), are widely used for probabilistic modeling of text. Markov chain Monte Carlo (MCMC) sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler. Supplementary materials for this article are available online.
引用
下载
收藏
页码:449 / 463
页数:15
相关论文
共 50 条
  • [21] Enhancing Discovered Process Models Using Bayesian Inference and MCMC
    Janssenswillen, Gert
    Depaire, Benoit
    Faes, Christel
    BUSINESS PROCESS MANAGEMENT WORKSHOPS, BPM 2020 INTERNATIONAL WORKSHOPS, 2020, 397 : 295 - 307
  • [22] Sensitivity estimations for Bayesian inference models solved by MCMC methods
    Perez, C. J.
    Martin, J.
    Rufo, M. J.
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2006, 91 (10-11) : 1310 - 1314
  • [23] Fitting sparse Markov models through a collapsed Gibbs sampler
    Iris Bennett
    Donald E. K. Martin
    Soumendra Nath Lahiri
    Computational Statistics, 2023, 38 : 1977 - 1994
  • [24] Variational Inference for Sparse and Undirected Models
    Ingraham, John
    Marks, Debora
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [25] What are the advantages of MCMC based inference in latent variable models?
    Paap, R
    STATISTICA NEERLANDICA, 2002, 56 (01) : 2 - 22
  • [26] Sparse Partially Linear Additive Models
    Lou, Yin
    Bien, Jacob
    Caruana, Rich
    Gehrke, Johannes
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2016, 25 (04) : 1026 - 1040
  • [27] Bayesian Inference for Partially Identified Models
    Gustafson, Paul
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2010, 6 (02):
  • [28] Powered embarrassing parallel MCMC sampling in Bayesian inference, a weighted average intuition
    Li, Song
    Tso, Geoffrey K. F.
    Long, Lufan
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 115 : 11 - 20
  • [29] Adaptive MCMC methods for inference on affine stochastic volatility models with jumps
    Raggi, D
    ECONOMETRICS JOURNAL, 2005, 8 (02): : 235 - 250
  • [30] Distributed MCMC Inference in Dirichlet Process Mixture Models Using Julia
    Dinari, Or
    Yu, Angel
    Freifeld, Oren
    Fisher, John W., III
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 518 - 525