Stochastic Variational Optimization of a Hierarchical Dirichlet Process Latent Beta-Liouville Topic Model

被引：1

作者：

Ihou, Koffi Eddy ^{[1
]}

Amayri, Manar ^{[2
]}

Bouguila, Nizar ^{[1
]}

机构：

[1] Concordia Univ, Montreal, PQ H3G 1M8, Canada

[2] Grenoble Inst Technol, F-38031 Grenoble, France

来源：

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA | 2022年 / 16卷 / 05期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Hierarchical dirichlet process; Bayesian nonparametric topic model; Beta-Liouville distribution; stochastic and variational optimizations; predictive distributions; POISSON-DIRICHLET; MIXTURE-MODELS; DISTRIBUTIONS;

D O I：

10.1145/3502727

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In topic models, collections are organized as documents where they arise as mixtures over latent clusters called topics. A topic is a distribution over the vocabulary. In large-scale applications, parametric or finite topic mixture models such as LDA (latent Dirichlet allocation) and its variants are very restrictive in performance due to their reduced hypothesis space. In this article, we address the problem related to model selection and sharing ability of topics across multiple documents in standard parametric topic models. We propose as an alternative a BNP (Bayesian nonparametric) topic model where the HDP (hierarchical Dirichlet process) prior models documents topic mixtures through their multinomials on infinite simplex. We, therefore, propose asymmetric BL (Beta-Liouville) as a diffuse base measure at the corpus level DP (Dirichlet process) over a measurable space. This step illustrates the highly heterogeneous structure in the set of all topics that describes the corpus probability measure. For consistency in posterior inference and predictive distributions, we efficiently characterize random probability measures whose limits are the global and local DPs to approximate the HDP from the stick-breaking formulation with the GEM (Griffiths-Engen-McCloskey) random variables. Due to the diffuse measure with the BL prior as conjugate to the count data distribution, we obtain an improved version of the standard HDP that is usually based on symmetric Dirichlet (Dir). In addition, to improve coordinate ascent framework while taking advantage of its deterministic nature, our model implements an online optimization method based on stochastic, at document level, variational inference to accommodate fast topic learning when processing large collections of text documents with natural gradient. The high value in the predictive likelihood per document obtained when compared to the performance of its competitors is also consistent with the robustness of our fully asymmetric BL-based HDP. While insuring the predictive accuracy of the model using the probability of the held-out documents, we also added a combination of metrics such as the topic coherence and topic diversity to improve the quality and interpretability of the topics discovered. We also compared the performance of our model using thesemetrics against the standard symmetric LDA. We show that online HDP-LBLA (Latent BL Allocation)'s performance is the asymptote for parametric topicmodels. The accuracy in the results (improved predictive distributions of the held out) is a product of the model's ability to efficiently characterize dependency between documents (topic correlation) as now they can easily share topics, resulting in a much robust and realistic compression algorithm for information modeling.

引用

页数：48

共 50 条

[1] A latent Beta-Liouville allocation model
Bakhtiari, Ali Shojaee
Bouguila, Nizar
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 45 : 260 - 272
[2] Online Learning of a Dirichlet Process Mixture of Beta-Liouville Distributions via Variational Inference
Fan, Wentao
Bouguila, Nizar
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (11) : 1850 - 1862
[3] Extended variational inference for Dirichlet process mixture of Beta-Liouville distributions for proportional data modeling
Lai, Yuping
Guan, Wenbo
Luo, Lijuan
Ruan, Qiang
Ping, Yuan
Song, Heping
Meng, Hongying
Pan, Yu
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (07) : 4277 - 4306
[4] Improving Topic Quality with Interactive Beta-Liouville Mixture Allocation Model
Maanicshah, Kamal
Amayri, Manar
Bouguila, Nizar
[J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1143 - 1148
[5] Model-Based Clustering Based on Variational Learning of Hierarchical Infinite Beta-Liouville Mixture Models
Wentao Fan
Nizar Bouguila
[J]. Neural Processing Letters, 2016, 44 : 431 - 449
[6] Model-Based Clustering Based on Variational Learning of Hierarchical Infinite Beta-Liouville Mixture Models
Fan, Wentao
Bouguila, Nizar
[J]. NEURAL PROCESSING LETTERS, 2016, 44 (02) : 431 - 449
[7] Variational learning for finite Beta-Liouville mixture models
LAI Yu-ping
ZHOU Ya-jian
PING Yuan
GUO Yu-cui
YANG Yi-xian
[J]. The Journal of China Universities of Posts and Telecommunications, 2014, 21 (02) : 98 - 103
[8] Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streaming
Ihou, Koffi Eddy
Bouguila, Nizar
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (01)
[9] Expectation propagation learning of a Dirichlet process mixture of Beta-Liouville distributions for proportional data clustering
Fan, Wentao
Bouguila, Nizar
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 43 : 1 - 14
[10] Variational Learning for the Inverted Beta-Liouville Mixture Model and Its Application to Text Categorization
Ling, Yongfa
Guan, Wenbo
Ruan, Qiang
Song, Heping
Lai, Yuping
[J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (05): : 76 - 84

← 1 2 3 4 5 →