Analysis and tuning of hierarchical topic models based on Renyi entropy approach

被引:0
|
作者
Koltcov S. [1 ]
Ignatenko V. [1 ]
Terpilovskii M. [1 ]
Rosso P. [1 ,2 ]
机构
[1] Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg
[2] Pattern Recognition and Human Language Technology Research Center, Universitat Politècnica de València, Valencia
关键词
Data Mining and Machine Learning; Data Science; Hierarchical topic models; Natural Language and Speech; Optimal number of topics; Renyi entropy; Topic modeling;
D O I
10.7717/PEERJ-CS.608
中图分类号
学科分类号
摘要
Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy. © 2021 Koltcov et al. All Rights Reserved.
引用
收藏
页码:1 / 35
页数:34
相关论文
共 50 条
  • [31] Renyi Entropy and Surrogate Data Analysis for Stock Markets
    Sun, Yupeng
    Shang, Pengjian
    He, Jiayi
    Xu, Mengjia
    FLUCTUATION AND NOISE LETTERS, 2018, 17 (04):
  • [32] Stability Analysis of Slope Based on Hierarchical Analysis of Fuzzy Entropy
    Zou, Zuyin
    Chen, Hang
    Chen, Yulong
    Long, Xuemei
    ELECTRONIC JOURNAL OF GEOTECHNICAL ENGINEERING, 2016, 21 (04): : 1399 - 1416
  • [33] Parameterized entanglement measures based on Renyi-α entropy
    Dai Wei-Peng
    He Kan
    Hou Jin-Chuan
    ACTA PHYSICA SINICA, 2024, 73 (04)
  • [34] Gray Image Thresholding Based on Threedimensional Renyi Entropy
    Wei, Wei
    2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), VOLS 1-3, 2013, : 599 - 603
  • [35] An Entropy-Based Approach for Measuring Factor Contributions in Factor Analysis Models
    Eshima, Nobuoki
    Tabata, Minoru
    Borroni, Claudio Giovanni
    ENTROPY, 2018, 20 (09)
  • [36] RED: A Set of Molecular Descriptors Based on Renyi Entropy
    Delgado-Soler, Laura
    Toral, Raul
    Santos Tomas, M.
    Rubio-Martinez, Jaime
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (11) : 2457 - 2468
  • [37] Quantum Coherence Quantifiers Based on Renyi α-Relative Entropy
    Shao, Lian-He
    Li, Yong-Ming
    Luo, Yu
    Xi, Zheng-Jun
    COMMUNICATIONS IN THEORETICAL PHYSICS, 2017, 67 (06) : 631 - 636
  • [38] A Generic Approach to Topic Models
    Heinrich, Gregor
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2009, 5781 : 517 - 532
  • [39] A network approach to topic models
    Gerlach, Martin
    Peixoto, Tiago P.
    Altmann, Eduardo G.
    SCIENCE ADVANCES, 2018, 4 (07):
  • [40] A study of Renyi entropy based on the information geometry formalism
    Scarfone, Antonio M.
    Matsuzoe, Hiroshi
    Wada, Tatsuaki
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2020, 53 (14)