Additive regularization of topic models

被引:59
|
作者
Vorontsov, Konstantin [1 ]
Potapenko, Anna [2 ]
机构
[1] RAS, Inst Phys & Technol, Dept Intelligent Syst, Dorodnicyn Comp Ctr, Moscow 117901, Russia
[2] Higher Sch Econ, Dept Comp Sci, Moscow, Russia
基金
俄罗斯基础研究基金会;
关键词
Probabilistic topic modeling; Regularization of ill-posed problems; Probabilistic latent sematic analysis; Latent Dirichlet allocation; EM-algorithm;
D O I
10.1007/s10994-014-5476-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Probabilistic topic modeling of text collections has been recently developed mainly within the framework of graphical models and Bayesian inference. In this paper we introduce an alternative semi-probabilistic approach, which we call additive regularization of topic models (ARTM). Instead of building a purely probabilistic generative model of text we regularize an ill-posed problem of stochastic matrix factorization by maximizing a weighted sum of the log-likelihood and additional criteria. This approach enables us to combine probabilistic assumptions with linguistic and problem-specific requirements in a single multi-objective topic model. In the theoretical part of the work we derive the regularized EM-algorithm and provide a pool of regularizers, which can be applied together in any combination. We show that many models previously developed within Bayesian framework can be inferred easier within ARTM and in some cases generalized. In the experimental part we show that a combination of sparsing, smoothing, and decorrelation improves several quality measures at once with almost no loss of the likelihood.
引用
收藏
页码:303 / 323
页数:21
相关论文
共 50 条
  • [1] Additive regularization of topic models
    Konstantin Vorontsov
    Anna Potapenko
    [J]. Machine Learning, 2015, 101 : 303 - 323
  • [2] Topic Balancing with Additive Regularization of Topic Models
    Eugeniia, Veselova
    Konstantin, Vorontsov
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 59 - 65
  • [3] Additive Regularization of Topic Models for Topic Selection and Sparse Factorization
    Vorontsov, Konstantin
    Potapenko, Anna
    Plavin, Alexander
    [J]. STATISTICAL LEARNING AND DATA SCIENCES, 2015, 9047 : 193 - 202
  • [4] Convergence of the Algorithm of Additive Regularization of Topic Models
    Irkhin, I. A.
    Vorontsov, K. V.
    [J]. PROCEEDINGS OF THE STEKLOV INSTITUTE OF MATHEMATICS, 2021, 315 (SUPPL 1) : S128 - S139
  • [5] Additive regularization for topic models of text collections
    K. V. Vorontsov
    [J]. Doklady Mathematics, 2014, 89 : 301 - 304
  • [6] CONVERGENCE OF THE ALGORITHM OF ADDITIVE REGULARIZATION OF TOPIC MODELS
    Irkhin, I. A.
    Vorontsov, K. V.
    [J]. TRUDY INSTITUTA MATEMATIKI I MEKHANIKI URO RAN, 2020, 26 (03): : 56 - +
  • [7] Additive regularization for topic models of text collections
    Vorontsov, K. V.
    [J]. DOKLADY MATHEMATICS, 2014, 89 (03) : 301 - 304
  • [8] Convergence of the Algorithm of Additive Regularization of Topic Models
    I. A. Irkhin
    K. V. Vorontsov
    [J]. Proceedings of the Steklov Institute of Mathematics, 2021, 315 : S128 - S139
  • [9] Coherence Regularization for Neural Topic Models
    Krasnashchok, Katsiaryna
    Cherif, Aymen
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2019, PT I, 2019, 11554 : 426 - 433
  • [10] Regularization methods for additive models
    Avalos, M
    Grandvalet, Y
    Ambroise, C
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS V, 2003, 2810 : 509 - 520