Entropy regularization in probabilistic clustering

被引:0
|
作者
Franzolini, Beatrice [1 ]
Rebaudo, Giovanni [2 ,3 ]
机构
[1] Bocconi Univ, Dept Decis Sci, Milan, Italy
[2] Univ Turin, Turin, Italy
[3] Collegio Carlo Alberto, Turin, Italy
来源
STATISTICAL METHODS AND APPLICATIONS | 2024年 / 33卷 / 01期
关键词
Dirichlet process; Loss functions; Mixture models; Unbalanced clusters; Random partition; DIRICHLET PROCESS; PARTITION DISTRIBUTION; OUTLIER DETECTION; MIXTURE-MODELS; INFERENCE; NUMBER;
D O I
10.1007/s10260-023-00716-y
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalized likelihood, we show how the unbalance can be explained as a direct consequence of the cost functions involved in estimating the partition. In light of our findings, we propose a novel Bayesian estimator of the clustering configuration. The proposed estimator is equivalent to a post-processing procedure that reduces the number of sparsely-populated clusters and enhances interpretability. The procedure takes the form of entropy-regularization of the Bayesian estimate. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific model used.
引用
收藏
页码:37 / 60
页数:24
相关论文
共 50 条
  • [41] Probabilistic Fair Clustering
    Esmaeili, Seyed A.
    Brubach, Brian
    Tsepenekas, Leonidas
    Dickerson, John P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [42] Scalable probabilistic clustering
    Bradley, PS
    Fayyad, UM
    Reina, CA
    COMPLEMENTARITY: APPLICATIONS, ALGORITHMS AND EXTENSIONS, 2001, 50 : 43 - 65
  • [43] Regularization in Probabilistic Inductive Logic Programming
    Gentili, Elisabetta
    Bizzarri, Alice
    Azzolini, Damiano
    Zese, Riccardo
    Riguzzi, Fabrizio
    INDUCTIVE LOGIC PROGRAMMING, ILP 2023, 2023, 14363 : 16 - 29
  • [44] A probabilistic theory of clustering
    Dougherty, ER
    Brun, M
    PATTERN RECOGNITION, 2004, 37 (05) : 917 - 925
  • [45] Probabilistic quantum clustering
    Casana-Eslava, Raul V.
    Lisboa, Paulo J. G.
    Ortega-Martorell, Sandra
    Jarman, Ian H.
    Martin-Guerrero, Jose D.
    KNOWLEDGE-BASED SYSTEMS, 2020, 194
  • [46] Penalized probabilistic clustering
    Lu, Zhengdong
    Leen, Todd K.
    NEURAL COMPUTATION, 2007, 19 (06) : 1528 - 1567
  • [47] An Entropy Regularization k-Means Algorithm with a New Measure of between-Cluster Distance in Subspace Clustering
    Xiong, Liyan
    Wang, Cheng
    Huang, Xiaohui
    Zeng, Hui
    ENTROPY, 2019, 21 (07)
  • [48] Nonextensive entropy and regularization for adaptive learning
    Anastasiadis, AD
    Magoulas, GD
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1067 - 1072
  • [49] Proximal Policy Optimization with Entropy Regularization
    Shen, Yuqing
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 380 - 383
  • [50] Neuro-Symbolic Entropy Regularization
    Ahmed, Kareem
    Wang, Eric
    Chang, Kai-Wei
    Van den Broeck, Guy
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 43 - 53