Penalized probabilistic clustering

被引:38
|
作者
Lu, Zhengdong [1 ]
Leen, Todd K. [1 ]
机构
[1] Oregan Hlth & Sci Inst, Dept Comp Sci & Engn, OGI Sch Sci & Engn, Beaverton, OR 97006 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
D O I
10.1162/neco.2007.19.6.1528
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.
引用
收藏
页码:1528 / 1567
页数:40
相关论文
共 50 条
  • [1] From Penalized Maximum Likelihood to cluster analysis: A unified probabilistic framework of clustering
    Sun, Xichen
    Cheng, Qiansheng
    Feng, Jufu
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (03) : 483 - 490
  • [2] Penalized Flow Hypergraph Local Clustering
    Zhong, Hao
    Zhang, Yubo
    Yan, Chenggang
    Xuan, Zuxing
    Yu, Ting
    Zhang, Ji
    Ying, Shihui
    Gao, Yue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 2110 - 2125
  • [3] A fast algorithm for nonsmooth penalized clustering
    Zhou, Ruizhi
    Shen, Xin
    Niu, Lingfeng
    NEUROCOMPUTING, 2018, 273 : 583 - 592
  • [4] Model Based Penalized Clustering for Multivariate Data
    Ghosh, Samiran
    Dey, Dipak K.
    ADVANCES IN MULTIVARIATE STATISTICAL METHODS, 2009, 4 : 53 - +
  • [5] Tumor Clustering based on Penalized Matrix Decomposition
    Zheng, Chun-Hou
    Wang, Juan
    Ng, To-Yee
    Shiu, Chi Keung
    2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING (ICBBE 2010), 2010,
  • [6] The Geometry of Uniqueness, Sparsity and Clustering in Penalized Estimation
    Schneider, Ulrike
    Tardivel, Patrick
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [7] The Geometry of Uniqueness, Sparsity and Clustering in Penalized Estimation
    Schneider, Ulrike
    Tardivel, Patrick
    Journal of Machine Learning Research, 2022, 23
  • [8] A Density-Penalized Distance Measure for Clustering
    Soleimani, Behrouz Haji
    Matwin, Stan
    De Souza, Erico N.
    ADVANCES IN ARTIFICIAL INTELLIGENCE (AI 2015), 2015, 9091 : 238 - 249
  • [9] A PROBABILISTIC APPROACH TO CLUSTERING
    BRAILOVSKY, VL
    PATTERN RECOGNITION LETTERS, 1991, 12 (04) : 193 - 198
  • [10] Classification by probabilistic clustering
    Breuel, TM
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 1333 - 1336