Penalized probabilistic clustering

被引:38
|
作者
Lu, Zhengdong [1 ]
Leen, Todd K. [1 ]
机构
[1] Oregan Hlth & Sci Inst, Dept Comp Sci & Engn, OGI Sch Sci & Engn, Beaverton, OR 97006 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
D O I
10.1162/neco.2007.19.6.1528
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.
引用
收藏
页码:1528 / 1567
页数:40
相关论文
共 50 条
  • [11] Probabilistic Fair Clustering
    Esmaeili, Seyed A.
    Brubach, Brian
    Tsepenekas, Leonidas
    Dickerson, John P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [12] Scalable probabilistic clustering
    Bradley, PS
    Fayyad, UM
    Reina, CA
    COMPLEMENTARITY: APPLICATIONS, ALGORITHMS AND EXTENSIONS, 2001, 50 : 43 - 65
  • [13] A probabilistic theory of clustering
    Dougherty, ER
    Brun, M
    PATTERN RECOGNITION, 2004, 37 (05) : 917 - 925
  • [14] Probabilistic quantum clustering
    Casana-Eslava, Raul V.
    Lisboa, Paulo J. G.
    Ortega-Martorell, Sandra
    Jarman, Ian H.
    Martin-Guerrero, Jose D.
    KNOWLEDGE-BASED SYSTEMS, 2020, 194
  • [15] Lag penalized weighted correlation for time series clustering
    Thevaa Chandereng
    Anthony Gitter
    BMC Bioinformatics, 21
  • [16] Robust subspace clustering via penalized mixture of Gaussians
    Yao, Jing
    Cao, Xiangyong
    Zhao, Qian
    Meng, Deyu
    Xu, Zongben
    NEUROCOMPUTING, 2018, 278 : 4 - 11
  • [17] Lag penalized weighted correlation for time series clustering
    Chandereng, Thevaa
    Gitter, Anthony
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [18] Penalized model-based clustering of fMRI data
    Dilernia, Andrew
    Quevedo, Karina
    Camchong, Jazmin
    Lim, Kelvin
    Pan, Wei
    Zhang, Lin
    BIOSTATISTICS, 2022, 23 (03) : 825 - 843
  • [19] Rival-penalized competitive clustering: A study and comparison
    Borghese, A. (borghese@di.unimi.it), 1600, Springer Science and Business Media Deutschland GmbH (19):
  • [20] Probabilistic Clustering of Wind Generators
    Ali, Muhammad
    Ilie, Irinel-Sorin
    Milanovic, Jovica V.
    Chicco, Gianfranco
    IEEE POWER AND ENERGY SOCIETY GENERAL MEETING 2010, 2010,