Analysis and tuning of hierarchical topic models based on Renyi entropy approach

被引：0

作者：

Koltcov S. ^{[1
]}

Ignatenko V. ^{[1
]}

Terpilovskii M. ^{[1
]}

Rosso P. ^{[1
,2
]}

机构：

[1] Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg

[2] Pattern Recognition and Human Language Technology Research Center, Universitat Politècnica de València, Valencia

来源：

PeerJ Computer Science | 2021年 / 7卷

关键词：

Data Mining and Machine Learning; Data Science; Hierarchical topic models; Natural Language and Speech; Optimal number of topics; Renyi entropy; Topic modeling;

D O I：

10.7717/PEERJ-CS.608

中图分类号：

学科分类号：

摘要：

Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy. © 2021 Koltcov et al. All Rights Reserved.

引用

页码：1 / 35

页数：34

共 50 条

[41] Face recognition based on manifold learning and Renyi entropy
Cao, Wenming
Ning, Li
PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 715 - 718
[42] Certain Relations in Statistical Physics Based on Renyi Entropy
Bakiev, T. N.
Nakashidze, D. V.
Savchenko, A. M.
MOSCOW UNIVERSITY PHYSICS BULLETIN, 2020, 75 (06) : 559 - 569
[43] Renyi Entropy Based Failure Detection of Medical Electrodes
Marasovic, Ivan
Saulig, Nicoletta
Milanovic, Zeljka
2015 23RD INTERNATIONAL CONFERENCE ON SOFTWARE, TELECOMMUNICATIONS AND COMPUTER NETWORKS (SOFTCOM), 2015, : 346 - 350
[44] Mutual information matrix based on Renyi entropy and application
Contreras-Reyes, Javier E.
NONLINEAR DYNAMICS, 2022, 110 (01) : 623 - 633
[45] Coherence measures based on sandwiched Renyi relative entropy
Xu, Jianwei
CHINESE PHYSICS B, 2020, 29 (01)
[46] Order statistics based estimator for Renyi's entropy
Hegde, A
Lan, T
Erdogmus, D
2005 IEEE Workshop on Machine Learning for Signal Processing (MLSP), 2005, : 335 - 339
[47] Renyi entropy perspective on topological order in classical toric code models
Helmes, Johannes
Stephan, Jean-Marie
Trebst, Simon
PHYSICAL REVIEW B, 2015, 92 (12):
[48] Enhanced automatic artifact detection based on independent component analysis and Renyi's entropy
Mammone, Nadia
Morabito, Francesco Carlo
NEURAL NETWORKS, 2008, 21 (07) : 1029 - 1040
[49] Subsystem Renyi entropy of thermal ensembles for SYK-like models
Zhang, Pengfei
Liu, Chunxiao
Chen, Xiao
SCIPOST PHYSICS, 2020, 8 (06):
[50] Topic-Based Hierarchical Segmentation
Chien, Jen-Tzung
Chueh, Chuang-Hua
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 55 - 66

← 1 2 3 4 5 →