Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model

被引:0
|
作者
George, Clint P. [1 ]
Doss, Hani [2 ]
机构
[1] Univ Florida, Inst Informat, Gainesville, FL 32611 USA
[2] Univ Florida, Dept Stat, Gainesville, FL 32611 USA
关键词
Empirical Bayes inference; latent Dirichlet allocation; Markov chain Monte Carlo; model selection; topic modelling; CHAIN MONTE-CARLO;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) is a well known topic model that is often used to make inference regarding the properties of collections of text documents. LDA is a hierarchical Bayesian model, and involves a prior distribution on a set of latent topic variables. The prior is indexed by certain hyperparameters, and even though these have a large impact on inference, they are usually chosen either in an ad-hoc manner, or by applying an algorithm whose theoretical basis has not been firmly established. We present a method, based on a combination of Markov chain Monte Carlo and importance sampling, for estimating the maximum likelihood estimate of the hyperparameters. The method may be viewed as a computational scheme for implementation of an empirical Bayes analysis. It comes with theoretical guarantees, and a key feature of our approach is that we provide theoretically-valid error margins for our estimates. Experiments on both synthetic and real data show good performance of our methodology.
引用
收藏
页数:38
相关论文
共 50 条
  • [2] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [3] LDAPrototype: a model selection algorithm to improve reliability of latent Dirichlet allocation
    Rieger, Jonas
    Jentsch, Carsten
    Rahnenfuehrer, Jorg
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [4] Unsupervised Feature Selection for Latent Dirichlet Allocation
    Xu Weiran
    Du Gang
    Chen Guang
    Guo Jun
    Yang Jie
    [J]. CHINA COMMUNICATIONS, 2011, 8 (05) : 54 - 62
  • [5] Scalable Hyperparameter Selection for Latent Dirichlet Allocation
    Xia, Wei
    Doss, Hani
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (04) : 875 - 895
  • [6] Indexing by Latent Dirichlet Allocation and an Ensemble Model
    Wang, Yanshan
    Lee, Jae-Sung
    Choi, In-Chan
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (07) : 1736 - 1750
  • [7] Latent Dirichlet Allocation (LDA) Model and kNN Algorithm to Classify Research Project Selection
    Saf'ie, M. A.
    Utami, E.
    Fatta, H. A.
    [J]. INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS FOR BETTER FUTURE 2017, 2018, 333
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [10] A Principled Approach to Expectation Maximisation and Latent Dirichlet Allocation Using Jeffrey's Update Rule
    Jacobs, Bart
    [J]. LOGIC, LANGUAGE, INFORMATION, AND COMPUTATION, WOLLIC 2023, 2023, 13923 : 256 - 273