Penalized probabilistic clustering

被引:38
|
作者
Lu, Zhengdong [1 ]
Leen, Todd K. [1 ]
机构
[1] Oregan Hlth & Sci Inst, Dept Comp Sci & Engn, OGI Sch Sci & Engn, Beaverton, OR 97006 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
D O I
10.1162/neco.2007.19.6.1528
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.
引用
收藏
页码:1528 / 1567
页数:40
相关论文
共 50 条
  • [21] Decentralized Probabilistic Text Clustering
    Papapetrou, Odysseas
    Siberski, Wolf
    Fuhr, Norbert
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (10) : 1848 - 1861
  • [22] Probabilistic clustering of interval data
    Brito, Paula
    Pedro Duarte Silva, A.
    Dias, Jose G.
    INTELLIGENT DATA ANALYSIS, 2015, 19 (02) : 293 - 313
  • [23] A Probabilistic Framework for Relational Clustering
    Long, Bo
    Zhang, Zhongfei
    Yu, Philip S.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 470 - 479
  • [24] Powered Outer Probabilistic Clustering
    Taraba, Peter
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2017, VOL I, 2017, : 394 - 398
  • [25] A probabilistic framework for graph clustering
    Luo, B
    Robles-Kelly, A
    Torsello, A
    Wilson, RC
    Hancock, ER
    2001 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2001, : 912 - 919
  • [26] Entropy regularization in probabilistic clustering
    Franzolini, Beatrice
    Rebaudo, Giovanni
    STATISTICAL METHODS AND APPLICATIONS, 2024, 33 (01): : 37 - 60
  • [27] Probabilistic Temporal Subspace Clustering
    Gholami, Behnam
    Pavlovic, Vladimir
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4313 - 4322
  • [28] A Probabilistic Algorithm for MANET Clustering
    Dabaghi-Zarandi, Fahimeh
    Minaei-Bidgoli, Behrouz
    Davarzani, Zohreh
    INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2014, 7 (06): : 59 - 67
  • [29] Probabilistic D-clustering
    Ben-Israel, Adi
    Iyigun, Cem
    JOURNAL OF CLASSIFICATION, 2008, 25 (01) : 5 - 26
  • [30] Clustering Large Probabilistic Graphs
    Kollios, George
    Potamias, Michalis
    Terzi, Evimaria
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (02) : 325 - 336