Entropy regularization in probabilistic clustering

被引:0
|
作者
Franzolini, Beatrice [1 ]
Rebaudo, Giovanni [2 ,3 ]
机构
[1] Bocconi Univ, Dept Decis Sci, Milan, Italy
[2] Univ Turin, Turin, Italy
[3] Collegio Carlo Alberto, Turin, Italy
来源
STATISTICAL METHODS AND APPLICATIONS | 2024年 / 33卷 / 01期
关键词
Dirichlet process; Loss functions; Mixture models; Unbalanced clusters; Random partition; DIRICHLET PROCESS; PARTITION DISTRIBUTION; OUTLIER DETECTION; MIXTURE-MODELS; INFERENCE; NUMBER;
D O I
10.1007/s10260-023-00716-y
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalized likelihood, we show how the unbalance can be explained as a direct consequence of the cost functions involved in estimating the partition. In light of our findings, we propose a novel Bayesian estimator of the clustering configuration. The proposed estimator is equivalent to a post-processing procedure that reduces the number of sparsely-populated clusters and enhances interpretability. The procedure takes the form of entropy-regularization of the Bayesian estimate. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific model used.
引用
收藏
页码:37 / 60
页数:24
相关论文
共 50 条
  • [21] Quantum entropy regularization
    Silver, RN
    MAXIMUM ENTROPY AND BAYESIAN METHODS, 1999, 105 : 91 - 98
  • [22] Fuzzy clustering with entropy regularization for interval-valued data with an application to scientific journal citations
    D'Urso, Pierpaolo
    De Giovanni, Livia
    Alaimo, Leonardo Salvatore
    Mattera, Raffaele
    Vitale, Vincenzina
    ANNALS OF OPERATIONS RESEARCH, 2024, 342 (03) : 1605 - 1628
  • [23] Possibilistic approach to kernel-based fuzzy c-means clustering with entropy regularization
    Mizutani, K
    Miyamoto, S
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3558 : 144 - 155
  • [24] Tractable Regularization of Probabilistic Circuits
    Liu, Anji
    Van den Broeck, Guy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [25] Entropy Regularization for Population Estimation
    Chugg, Ben
    Henderson, Peter
    Goldin, Jacob
    Ho, Daniel E.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 12198 - 12204
  • [26] The entropy regularization information criterion
    Smola, AJ
    Shawe-Taylor, J
    Schölkopf, B
    Williamson, RC
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 342 - 348
  • [27] Clustering with Local and Global Regularization
    Wang, Fei
    Zhang, Changshui
    Li, Tao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) : 1665 - 1678
  • [28] Impact of Regularization on Spectral clustering
    Joseph, Antony
    Yu, Bin
    2014 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2014, : 238 - 239
  • [29] Fuzzy Clustering Algorithm Based on Adaptive Euclidean Distance and Entropy Regularization for Interval-Valued Data
    Rizo Rodriguez, Sara Ines
    Tenorio de Carvalho, Francisco de Assis
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 695 - 705
  • [30] An equivalence between log-sum-exp approximation and entropy regularization in K-means clustering
    Inoue, Kohei
    Hara, Kenji
    IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2020, 11 (04): : 446 - 453