Asynchronous distributed estimation of topic models for document analysis

被引：8

作者：

Asuncion, Arthur U. ^{[1
]}

Smyth, Padhraic ^{[1
]}

Welling, Max ^{[1
]}

机构：

[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92717 USA

来源：

STATISTICAL METHODOLOGY | 2011年 / 8卷 / 01期

关键词：

Topic model; Distributed learning; Parallelization; Gibbs sampling;

D O I：

10.1016/j.stamet.2010.03.002

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Given the prevalence of large data sets and the availability of inexpensive parallel computing hardware, there is significant motivation to explore distributed implementations of statistical learning algorithms. In this paper, we present a distributed learning framework for Latent Dirichlet Allocation (LDA), a well-known Bayesian latent variable model for sparse matrices of count data. In the proposed approach, data are distributed across P processors, and processors independently perform inference on their local data and communicate their sufficient statistics in a local asynchronous manner with other processors. We apply two different approximate inference techniques for LDA, collapsed Gibbs sampling and collapsed variational inference, within a distributed framework. The results show significant improvements in computation time and memory when running the algorithms on very large text corpora using parallel hardware. Despite the approximate nature of the proposed approach, simulations suggest that asynchronous distributed algorithms are able to learn models that are nearly as accurate as those learned by the standard non-distributed approaches. We also find that our distributed algorithms converge rapidly to good solutions. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：3 / 17

页数：15

共 50 条

[21] Integrating social annotations into topic models for personalized document retrieval
Xu, Bo
Lin, Hongfei
Lin, Yuan
Guan, Yizhou
SOFT COMPUTING, 2020, 24 (03) : 1707 - 1716
[22] Asynchronous Distributed Nonlinear Estimation Over Directed Networks
Wang, Qianyao
Yu, Rui
Meng, Min
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (02): : 2062 - 2073
[23] Integrating social annotations into topic models for personalized document retrieval
Bo Xu
Hongfei Lin
Yuan Lin
Yizhou Guan
Soft Computing, 2020, 24 : 1707 - 1716
[24] Statistical topic models for multi-label document classification
Timothy N. Rubin
America Chambers
Padhraic Smyth
Mark Steyvers
Machine Learning, 2012, 88 : 157 - 208
[25] Statistical topic models for multi-label document classification
Rubin, Timothy N.
Chambers, America
Smyth, Padhraic
Steyvers, Mark
MACHINE LEARNING, 2012, 88 (1-2) : 157 - 208
[26] Expert-Informed Topic Models for Document Set Discovery
Rinke, Eike Mark
Dobbrick, Timo
Loeb, Charlotte
Zirn, Cacilia
Wessler, Hartmut
COMMUNICATION METHODS AND MEASURES, 2022, 16 (01) : 39 - 58
[27] Distributed Sequential Estimation in Asynchronous Wireless Sensor Networks
Hlinka, Ondrej
Hlawatsch, Franz
Djuric, Petar M.
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (11) : 1965 - 1969
[28] Robust Unsupervised Segmentation of Degraded Document Images with Topic Models
Burns, Timothy J.
Corso, Jason J.
CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 1287 - 1294
[29] A method of refining topic models based on term and document frequencies
Higashi K.
Takahashi H.
Nakagawa H.
Tsuchiya T.
Computer Software, 2019, 36 (04) : 25 - 31
[30] Table Topic Models for Hidden Unit Estimation
Yoshida, Minoru
Matsumoto, Kazuyuki
Kita, Kenji
INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2016, 2016, 9994 : 302 - 307

← 1 2 3 4 5 →