Penalized probabilistic clustering

被引:38
|
作者
Lu, Zhengdong [1 ]
Leen, Todd K. [1 ]
机构
[1] Oregan Hlth & Sci Inst, Dept Comp Sci & Engn, OGI Sch Sci & Engn, Beaverton, OR 97006 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
D O I
10.1162/neco.2007.19.6.1528
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on gaussian mixture models (GMM) of the data distribution. We express clustering preferences in a prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. The model parameters are fit with the expectation-maximization (EM) algorithm. Our model provides a flexible framework that encompasses several other semisupervised clustering models as its special cases. Experiments on artificial and real-world problems show that our model can consistently improve clustering results when pairwise relations are incorporated. The experiments also demonstrate the superiority of our model to other semisupervised clustering methods on handling noisy pairwise relations.
引用
收藏
页码:1528 / 1567
页数:40
相关论文
共 50 条
  • [41] Unsupervised image segmentation using penalized fuzzy clustering algorithm
    Yang, Y
    Zhang, F
    Zheng, CX
    Lin, P
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 71 - 77
  • [42] Penalized model-based clustering of complex functional data
    Pronello, Nicola
    Ignaccolo, Rosaria
    Ippoliti, Luigi
    Fontanella, Sara
    STATISTICS AND COMPUTING, 2023, 33 (06)
  • [43] A New Algorithm and Theory for Penalized Regression-based Clustering
    Wu, Chong
    Kwon, Sunghoon
    Shen, Xiaotong
    Pan, Wei
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17 : 1 - 25
  • [44] Clustering with missing features: a penalized dissimilarity measure based approach
    Datta, Shounak
    Bhattacharjee, Supritam
    Das, Swagatam
    MACHINE LEARNING, 2018, 107 (12) : 1987 - 2025
  • [45] Rival penalized competitive learning, finite mixture, and multisets clustering
    Xu, L
    IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2525 - 2530
  • [46] Clustering of longitudinal curves via a penalized method and EM algorithm
    Wang, Xin
    COMPUTATIONAL STATISTICS, 2024, 39 (03) : 1485 - 1512
  • [47] Clustering of longitudinal curves via a penalized method and EM algorithm
    Xin Wang
    Computational Statistics, 2024, 39 : 1485 - 1512
  • [48] Nonsmooth Penalized Clustering via lp Regularized Sparse Regression
    Niu, Lingfeng
    Zhou, Ruizhi
    Tian, Yingjie
    Qi, Zhiquan
    Zhang, Peng
    IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (06) : 1423 - 1433
  • [49] Exploring dimension learning via a penalized probabilistic principal component analysis
    Deng, Wei Q.
    Craiu, Radu, V
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (02) : 266 - 297
  • [50] Probabilistic electricity price forecasting based on penalized temporal fusion transformer
    Jiang, He
    Pan, Sheng
    Dong, Yao
    Wang, Jianzhou
    JOURNAL OF FORECASTING, 2024, 43 (05) : 1465 - 1491