Choosing the Number of Topics in LDA Models - A Monte Carlo Comparison of Selection Criteria

被引:0
|
作者
Bystrov, Victor [1 ]
Naboka-Krell, Viktoriia [2 ]
Staszewska-Bystrova, Anna [3 ]
Winker, Peter [2 ]
机构
[1] Univ Lodz, Fac Econ & Sociol, Rewolucji 1905r 41, PL-90214 Lodz, Poland
[2] Justus Liebig Univ Giessen, Dept Stat & Econometr, Licher Str 64, D-35394 Giessen, Germany
[3] Univ Lodz, Fac Econ & Sociol, Rewolucji 1905r 37-39, PL-90214 Lodz, Poland
关键词
Topic models; text analysis; latent Dirichlet allocation; singular Bayesian information criterion; Monte Carlo simulation; text generation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Selecting the number of topics in Latent Dirichlet Allocation (LDA) models is considered to be a difficult task, for which various approaches have been proposed. In this paper the performance of the recently developed singular Bayesian information criterion (sBIC) is evaluated and compared to the performance of alternative model selection criteria. The sBIC is a generalization of the standard BIC that can be applied to singular statistical models. The comparison is based on Monte Carlo simulations and carried out for several alternative settings, varying with respect to the number of topics, the number of documents and the size of documents in the corpora. Performance is measured using different criteria which take into account the correct number of topics, but also whether the relevant topics from the considered data generation processes (DGPs) are revealed. Practical recommendations for LDA model selection in applications are derived.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] THE MONTE-CARLO COMPARISON OF 2 CRITERIA FOR THE SELECTION OF MODELS
    HERZBERG, AM
    TSUKANOV, AV
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1985, 22 (02) : 113 - 126
  • [2] CHOOSING NUMBER OF TRIALS IN MONTE-CARLO METHOD
    KOZEYEV, VA
    [J]. ENGINEERING CYBERNETICS, 1973, 11 (03): : 519 - 522
  • [3] Comparison of Criteria for Choosing the Number of Classes in Bayesian Finite Mixture Models
    Nasserinejad, Kazem
    van Rosmalen, Joost
    de Kort, Wim
    Lesaffre, Emmanuel
    [J]. PLOS ONE, 2017, 12 (01):
  • [4] Selection of voxel size and photon number in voxel-based Monte Carlo method: criteria and applications
    Li, Dong
    Chen, Bin
    Ran, Wei Yu
    Wang, Guo Xiang
    Wu, Wen Juan
    [J]. JOURNAL OF BIOMEDICAL OPTICS, 2015, 20 (09)
  • [5] Minimizing variable selection criteria by Markov chain Monte Carlo
    Chin, Yen-Shiu
    Chen, Ting-Li
    [J]. COMPUTATIONAL STATISTICS, 2016, 31 (04) : 1263 - 1286
  • [6] Minimizing variable selection criteria by Markov chain Monte Carlo
    Yen-Shiu Chin
    Ting-Li Chen
    [J]. Computational Statistics, 2016, 31 : 1263 - 1286
  • [7] A Monte Carlo comparison of estimating the number of dynamic factors
    Zhao, Zhao
    Cui, Guowei
    Wang, Shaoping
    [J]. EMPIRICAL ECONOMICS, 2017, 53 (03) : 1217 - 1241
  • [8] A Monte Carlo comparison of estimating the number of dynamic factors
    Zhao Zhao
    Guowei Cui
    Shaoping Wang
    [J]. Empirical Economics, 2017, 53 : 1217 - 1241
  • [9] Monitoring and selection of dynamic models by Monte Carlo sampling
    Djuric, PM
    [J]. PROCEEDINGS OF THE IEEE SIGNAL PROCESSING WORKSHOP ON HIGHER-ORDER STATISTICS, 1999, : 191 - 194
  • [10] Selection of random number generators in GATE Monte Carlo toolkit
    Sepehri, Fatemeh
    Hajivaliei, Mahdi
    Rajabi, Hossein
    [J]. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2020, 973