Analysis and tuning of hierarchical topic models based on Renyi entropy approach

被引:0
|
作者
Koltcov S. [1 ]
Ignatenko V. [1 ]
Terpilovskii M. [1 ]
Rosso P. [1 ,2 ]
机构
[1] Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg
[2] Pattern Recognition and Human Language Technology Research Center, Universitat Politècnica de València, Valencia
关键词
Data Mining and Machine Learning; Data Science; Hierarchical topic models; Natural Language and Speech; Optimal number of topics; Renyi entropy; Topic modeling;
D O I
10.7717/PEERJ-CS.608
中图分类号
学科分类号
摘要
Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy. © 2021 Koltcov et al. All Rights Reserved.
引用
收藏
页码:1 / 35
页数:34
相关论文
共 50 条
  • [21] Multifractal weighted permutation analysis based on Renyi entropy for financial time series
    Liu, Zhengli
    Shang, Pengjian
    Wang, Yuanyuan
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 536
  • [22] Multiscale multifractal multiproperty analysis of financial time series based on Renyi entropy
    Yang Yujun
    Li Jianping
    Yang Yimei
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2017, 28 (02):
  • [23] Renyi entropy of a line in two-dimensional Ising models
    Stephan, J. -M.
    Misguich, G.
    Pasquier, V.
    PHYSICAL REVIEW B, 2010, 82 (12)
  • [24] Effects of TFD Thresholding On EEG Signal Analysis Based On The Local Renyi Entropy
    Lerga, Jonatan
    Saulig, Nicoletta
    Lerga, Rebeka
    Milanovic, Zeljka
    2017 2ND INTERNATIONAL MULTIDISCIPLINARY CONFERENCE ON COMPUTER AND ENERGY SCIENCE (SPLITECH), 2017, : 6 - 11
  • [25] A Web Service Matchmaking Approach based on Topic Models
    Yu Peng
    Liu Junju
    Wang Jian
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 604 - 607
  • [26] A Topic Coverage Approach to Evaluation of Topic Models
    Korencic, Damir
    Ristov, Strahil
    Repar, Jelena
    Snajder, Jan
    IEEE ACCESS, 2021, 9 : 123280 - 123312
  • [27] Hierarchical Topic Models for Expanding Category Hierarchies
    Yamamoto, Kohei
    Eguchi, Koji
    Takasu, Atsuhiro
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 242 - 249
  • [28] Latent tree models for hierarchical topic detection
    Chen, Peixian
    Zhang, Nevin L.
    Liu, Tengfei
    Poon, Leonard K. M.
    Chen, Zhourong
    Khawar, Farhan
    ARTIFICIAL INTELLIGENCE, 2017, 250 : 105 - 124
  • [29] Adaptive Algorithm Based on Renyi's Entropy for Task Mapping in a Hierarchical Wireless Network-on-Chip Architecture
    Sacanamboy, Maribell
    Bolanos, Freddy
    Bernal, Alvaro
    COMPUTACION Y SISTEMAS, 2018, 22 (03): : 985 - 996
  • [30] An Approach to Canonical Correlation Analysis Based on Renyi's Pseudodistances
    Jaenada, Maria
    Miranda, Pedro
    Pardo, Leandro
    Zografos, Konstantinos
    ENTROPY, 2023, 25 (05)