Entropy regularization in probabilistic clustering

被引:0
|
作者
Franzolini, Beatrice [1 ]
Rebaudo, Giovanni [2 ,3 ]
机构
[1] Bocconi Univ, Dept Decis Sci, Milan, Italy
[2] Univ Turin, Turin, Italy
[3] Collegio Carlo Alberto, Turin, Italy
来源
STATISTICAL METHODS AND APPLICATIONS | 2024年 / 33卷 / 01期
关键词
Dirichlet process; Loss functions; Mixture models; Unbalanced clusters; Random partition; DIRICHLET PROCESS; PARTITION DISTRIBUTION; OUTLIER DETECTION; MIXTURE-MODELS; INFERENCE; NUMBER;
D O I
10.1007/s10260-023-00716-y
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalized likelihood, we show how the unbalance can be explained as a direct consequence of the cost functions involved in estimating the partition. In light of our findings, we propose a novel Bayesian estimator of the clustering configuration. The proposed estimator is equivalent to a post-processing procedure that reduces the number of sparsely-populated clusters and enhances interpretability. The procedure takes the form of entropy-regularization of the Bayesian estimate. While being computationally convenient with respect to alternative strategies, it is also theoretically justified as a correction to the Bayesian loss function used for point estimation and, as such, can be applied to any posterior distribution of clusters, regardless of the specific model used.
引用
收藏
页码:37 / 60
页数:24
相关论文
共 50 条
  • [31] Fuzzy clustering by quadratic regularization
    Miyamoto, S
    Umayahara, K
    1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1394 - 1399
  • [32] IMPACT OF REGULARIZATION ON SPECTRAL CLUSTERING
    Joseph, Antony
    Yu, Bin
    ANNALS OF STATISTICS, 2016, 44 (04): : 1765 - 1791
  • [33] Regularization background of clustering algorithms
    Bogus, P
    Lewandowska, K
    Masulli, F
    NEURAL NETWORKS AND SOFT COMPUTING, 2003, : 584 - 589
  • [34] Probabilistic aspects of entropy
    Georgii, HO
    ENTROPY-BOOK, 2003, : 37 - 54
  • [35] Probabilistic aspects of entropy
    Georgii, HO
    ENTROPY-BK, 2003, : 37 - 54
  • [36] Probabilistic process monitoring with Bayesian regularization
    Zhang, Muguang
    Ge, Zhiqiang
    Song, Zhihuan
    2010 AMERICAN CONTROL CONFERENCE, 2010, : 6999 - 7003
  • [37] A Discrete Regularization for Probabilistic Graphical Models
    Kriukova, Galyna
    7TH INTERNATIONAL EURASIAN CONFERENCE ON MATHEMATICAL SCIENCES AND APPLICATIONS (IECMSA-2018), 2018, 2037
  • [38] Probabilistic regularization in inverse optical imaging
    De Micheli, E
    Viano, GA
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2000, 17 (11): : 1942 - 1951
  • [39] A PROBABILISTIC APPROACH TO CLUSTERING
    BRAILOVSKY, VL
    PATTERN RECOGNITION LETTERS, 1991, 12 (04) : 193 - 198
  • [40] Classification by probabilistic clustering
    Breuel, TM
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 1333 - 1336