Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic Model

被引:1
|
作者
Ihou, Koffi Eddy [1 ]
Amayri, Manar [2 ]
Bouguila, Nizar [1 ]
机构
[1] Concordia Univ, Montreal, PQ H3G 1M8, Canada
[2] Grenoble Inst Technol, F-38031 Grenoble, France
基金
加拿大自然科学与工程研究理事会;
关键词
Hierarchical dirichlet process; Bayesian nonparametric topic model; Beta-Liouville distribution; stochastic and variational optimizations; predictive distributions; POISSON-DIRICHLET; MIXTURE-MODELS; DISTRIBUTIONS;
D O I
10.1145/3502727
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In topic models, collections are organized as documents where they arise as mixtures over latent clusters called topics. A topic is a distribution over the vocabulary. In large-scale applications, parametric or finite topic mixture models such as LDA (latent Dirichlet allocation) and its variants are very restrictive in performance due to their reduced hypothesis space. In this article, we address the problem related to model selection and sharing ability of topics across multiple documents in standard parametric topic models. We propose as an alternative a BNP (Bayesian nonparametric) topic model where the HDP (hierarchical Dirichlet process) prior models documents topic mixtures through their multinomials on infinite simplex. We, therefore, propose asymmetric BL (Beta-Liouville) as a diffuse base measure at the corpus level DP (Dirichlet process) over a measurable space. This step illustrates the highly heterogeneous structure in the set of all topics that describes the corpus probability measure. For consistency in posterior inference and predictive distributions, we efficiently characterize random probability measures whose limits are the global and local DPs to approximate the HDP from the stick-breaking formulation with the GEM (Griffiths-Engen-McCloskey) random variables. Due to the diffuse measure with the BL prior as conjugate to the count data distribution, we obtain an improved version of the standard HDP that is usually based on symmetric Dirichlet (Dir). In addition, to improve coordinate ascent framework while taking advantage of its deterministic nature, our model implements an online optimization method based on stochastic, at document level, variational inference to accommodate fast topic learning when processing large collections of text documents with natural gradient. The high value in the predictive likelihood per document obtained when compared to the performance of its competitors is also consistent with the robustness of our fully asymmetric BL-based HDP. While insuring the predictive accuracy of the model using the probability of the held-out documents, we also added a combination of metrics such as the topic coherence and topic diversity to improve the quality and interpretability of the topics discovered. We also compared the performance of our model using thesemetrics against the standard symmetric LDA. We show that online HDP-LBLA (Latent BL Allocation)'s performance is the asymptote for parametric topicmodels. The accuracy in the results (improved predictive distributions of the held out) is a product of the model's ability to efficiently characterize dependency between documents (topic correlation) as now they can easily share topics, resulting in a much robust and realistic compression algorithm for information modeling.
引用
收藏
页数:48
相关论文
共 50 条
  • [1] A latent Beta-Liouville allocation model
    Bakhtiari, Ali Shojaee
    Bouguila, Nizar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 45 : 260 - 272
  • [2] Online Learning of a Dirichlet Process Mixture of Beta-Liouville Distributions via Variational Inference
    Fan, Wentao
    Bouguila, Nizar
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (11) : 1850 - 1862
  • [3] Extended variational inference for Dirichlet process mixture of Beta-Liouville distributions for proportional data modeling
    Lai, Yuping
    Guan, Wenbo
    Luo, Lijuan
    Ruan, Qiang
    Ping, Yuan
    Song, Heping
    Meng, Hongying
    Pan, Yu
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (07) : 4277 - 4306
  • [4] Improving Topic Quality with Interactive Beta-Liouville Mixture Allocation Model
    Maanicshah, Kamal
    Amayri, Manar
    Bouguila, Nizar
    [J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1143 - 1148
  • [5] Model-Based Clustering Based on Variational Learning of Hierarchical Infinite Beta-Liouville Mixture Models
    Wentao Fan
    Nizar Bouguila
    [J]. Neural Processing Letters, 2016, 44 : 431 - 449
  • [6] Model-Based Clustering Based on Variational Learning of Hierarchical Infinite Beta-Liouville Mixture Models
    Fan, Wentao
    Bouguila, Nizar
    [J]. NEURAL PROCESSING LETTERS, 2016, 44 (02) : 431 - 449
  • [7] Variational learning for finite Beta-Liouville mixture models
    LAI Yu-ping
    ZHOU Ya-jian
    PING Yuan
    GUO Yu-cui
    YANG Yi-xian
    [J]. The Journal of China Universities of Posts and Telecommunications, 2014, 21 (02) : 98 - 103
  • [8] Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streaming
    Ihou, Koffi Eddy
    Bouguila, Nizar
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (01)
  • [9] Expectation propagation learning of a Dirichlet process mixture of Beta-Liouville distributions for proportional data clustering
    Fan, Wentao
    Bouguila, Nizar
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 43 : 1 - 14
  • [10] Variational Learning for the Inverted Beta-Liouville Mixture Model and Its Application to Text Categorization
    Ling, Yongfa
    Guan, Wenbo
    Ruan, Qiang
    Song, Heping
    Lai, Yuping
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (05): : 76 - 84