Model Based Penalized Clustering for Multivariate Data

被引:0
|
作者
Ghosh, Samiran [1 ]
Dey, Dipak K. [2 ]
机构
[1] Indiana Univ Purdue Univ, Dept Math Sci, Indianapolis, IN 46202 USA
[2] Univ Connecticut, Dept Stat, Storrs, CT 06269 USA
关键词
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Over the last decade a variety of clustering algorithms have evolved. However one of the simplest (and possibly overused) partition based clustering algorithm is K-means. It can be shown that the computational complexity of K-means does not suffer from exponential growth with dimensionality rather it is linearly proportional with the number of observations and number of clusters. The crucial requirements are the knowledge of cluster number and the computation of some suitably chosen similarity measure. For this simplicity and scalability, among large data sets K-means remains an attractive alternative when compared to other competing clustering philosophies especially for high dimensional domain. However being a deterministic algorithm, traditional K-means have several drawbacks. It only offers hard decision rule, with no probabilistic interpretation. In this paper we have developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep. This will not only enable us to do a soft. clustering, rather the whole optimization problem could be re-casted into Bayesian modeling framework, in which the knowledge of cluster number could be treated as an unknown parameter of interest, thus removing a severe constrain of K-means algorithm. Our basic idea is to keep the simplicity and scalability of K-means, while achieving some of the desired properties of the other model based or soft clustering approaches.
引用
收藏
页码:53 / +
页数:3
相关论文
共 50 条
  • [1] Penalized model-based clustering of fMRI data
    Dilernia, Andrew
    Quevedo, Karina
    Camchong, Jazmin
    Lim, Kelvin
    Pan, Wei
    Zhang, Lin
    [J]. BIOSTATISTICS, 2022, 23 (03) : 825 - 843
  • [2] Penalized model-based clustering of complex functional data
    Nicola Pronello
    Rosaria Ignaccolo
    Luigi Ippoliti
    Sara Fontanella
    [J]. Statistics and Computing, 2023, 33
  • [3] Penalized model-based clustering of complex functional data
    Pronello, Nicola
    Ignaccolo, Rosaria
    Ippoliti, Luigi
    Fontanella, Sara
    [J]. STATISTICS AND COMPUTING, 2023, 33 (06)
  • [4] Model-based clustering for multivariate functional data
    Jacques, Julien
    Preda, Cristian
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 92 - 106
  • [5] Model-based clustering for multivariate partial ranking data
    Jacques, Julien
    Biernacki, Christophe
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2014, 149 : 201 - 217
  • [6] Probabilistic model-based clustering of multivariate and sequential data
    Smyth, P
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS 99, PROCEEDINGS, 1999, : 299 - 304
  • [7] A Penalized Matrix Normal Mixture Model for Clustering Matrix Data
    Heo, Jinwon
    Baek, Jangsun
    [J]. ENTROPY, 2021, 23 (10)
  • [8] VAR Model Based Clustering Method for Multivariate Time Series Data
    Deb S.
    [J]. Journal of Mathematical Sciences, 2019, 237 (6) : 754 - 765
  • [9] BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data
    Papastamoulis, Panagiotis
    Rattray, Magnus
    [J]. R JOURNAL, 2017, 9 (01): : 403 - 420
  • [10] Penalized Mediation Models for Multivariate Data
    Schaid, Daniel J.
    Dikilitas, Ozan
    Sinnwell, Jason P.
    Kullo, Iftikhar
    [J]. GENETIC EPIDEMIOLOGY, 2021, 45 (07) : 788 - 788