Model Based Penalized Clustering for Multivariate Data

被引:0
|
作者
Ghosh, Samiran [1 ]
Dey, Dipak K. [2 ]
机构
[1] Indiana Univ Purdue Univ, Dept Math Sci, Indianapolis, IN 46202 USA
[2] Univ Connecticut, Dept Stat, Storrs, CT 06269 USA
关键词
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Over the last decade a variety of clustering algorithms have evolved. However one of the simplest (and possibly overused) partition based clustering algorithm is K-means. It can be shown that the computational complexity of K-means does not suffer from exponential growth with dimensionality rather it is linearly proportional with the number of observations and number of clusters. The crucial requirements are the knowledge of cluster number and the computation of some suitably chosen similarity measure. For this simplicity and scalability, among large data sets K-means remains an attractive alternative when compared to other competing clustering philosophies especially for high dimensional domain. However being a deterministic algorithm, traditional K-means have several drawbacks. It only offers hard decision rule, with no probabilistic interpretation. In this paper we have developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep. This will not only enable us to do a soft. clustering, rather the whole optimization problem could be re-casted into Bayesian modeling framework, in which the knowledge of cluster number could be treated as an unknown parameter of interest, thus removing a severe constrain of K-means algorithm. Our basic idea is to keep the simplicity and scalability of K-means, while achieving some of the desired properties of the other model based or soft clustering approaches.
引用
收藏
页码:53 / +
页数:3
相关论文
共 50 条
  • [31] A novel heuristic algorithm to solve penalized regression-based clustering model
    Tavakkoli, Shadi Hasanzadeh
    Forghani, Yahya
    Sheibani, Reza
    [J]. SOFT COMPUTING, 2020, 24 (12) : 9215 - 9225
  • [32] Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers
    Maruotti, Antonello
    Punzo, Antonio
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 113 : 475 - 496
  • [33] A novel heuristic algorithm to solve penalized regression-based clustering model
    Shadi Hasanzadeh Tavakkoli
    Yahya Forghani
    Reza Sheibani
    [J]. Soft Computing, 2020, 24 : 9215 - 9225
  • [34] Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering
    Casa, Alessandro
    Cappozzo, Andrea
    Fop, Michael
    [J]. JOURNAL OF CLASSIFICATION, 2022, 39 (03) : 648 - 674
  • [35] Clustering-Based Collaborative Filtering Using an Incentivized/Penalized User Model
    Tran, Cong
    Kim, Jang-Young
    Shin, Won-Yong
    Kim, Sang-Wook
    [J]. IEEE ACCESS, 2019, 7 : 62115 - 62125
  • [36] Penalized least distance estimator in the multivariate regression model
    Shin, Jungmin
    Kang, Jongkyeong
    Bang, Sungwan
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (01) : 1 - 12
  • [37] Covariance-based Clustering in Multivariate and Functional Data Analysis
    Ieva, Francesca
    Paganoni, Anna Maria
    Tarabelloni, Nicholas
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17 : 1 - 21
  • [38] Tumor Clustering based on Penalized Matrix Decomposition
    Zheng, Chun-Hou
    Wang, Juan
    Ng, To-Yee
    Shiu, Chi Keung
    [J]. 2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING (ICBBE 2010), 2010,
  • [39] A model-based approach for clustering of multivariate semicontinuous data with application to dietary pattern analysis and intervention
    Jiang, Tao
    Lu, Yahui
    Duan, Huimin
    Zhang, Wei
    Liu, Aiyi
    [J]. STATISTICS IN MEDICINE, 2020, 39 (01) : 16 - 25
  • [40] Finite Mixtures of Multivariate Wrapped Normal Distributions for Model Based Clustering of p-Torus Data
    Greco, Luca
    Inverardi, Pier Luigi Novi
    Agostinelli, Claudio
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (03) : 1215 - 1228