Model Based Penalized Clustering for Multivariate Data

被引:0
|
作者
Ghosh, Samiran [1 ]
Dey, Dipak K. [2 ]
机构
[1] Indiana Univ Purdue Univ, Dept Math Sci, Indianapolis, IN 46202 USA
[2] Univ Connecticut, Dept Stat, Storrs, CT 06269 USA
关键词
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Over the last decade a variety of clustering algorithms have evolved. However one of the simplest (and possibly overused) partition based clustering algorithm is K-means. It can be shown that the computational complexity of K-means does not suffer from exponential growth with dimensionality rather it is linearly proportional with the number of observations and number of clusters. The crucial requirements are the knowledge of cluster number and the computation of some suitably chosen similarity measure. For this simplicity and scalability, among large data sets K-means remains an attractive alternative when compared to other competing clustering philosophies especially for high dimensional domain. However being a deterministic algorithm, traditional K-means have several drawbacks. It only offers hard decision rule, with no probabilistic interpretation. In this paper we have developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep. This will not only enable us to do a soft. clustering, rather the whole optimization problem could be re-casted into Bayesian modeling framework, in which the knowledge of cluster number could be treated as an unknown parameter of interest, thus removing a severe constrain of K-means algorithm. Our basic idea is to keep the simplicity and scalability of K-means, while achieving some of the desired properties of the other model based or soft clustering approaches.
引用
收藏
页码:53 / +
页数:3
相关论文
共 50 条
  • [21] Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure
    Allegrini, Franco
    Braga, Jez W. B.
    Moreira, Alessandro C. O.
    Olivieri, Alejandro C.
    [J]. ANALYTICA CHIMICA ACTA, 2018, 1011 : 20 - 27
  • [22] Model-based clustering of multivariate skew data with circular components and missing values
    Lagona, Francesco
    Picone, Marco
    [J]. JOURNAL OF APPLIED STATISTICS, 2012, 39 (05) : 927 - 945
  • [23] A Model Selection Algorithm For Mixture Model Clustering Of Heterogeneous Multivariate Data
    Erol, Hamza
    [J]. 2013 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (IEEE INISTA), 2013,
  • [24] Fuzzy Clustering based on α-Divergence for Spherical Data and for Categorical Multivariate Data
    Kanzawa, Yuchi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [25] Clustering of multivariate geostatistical data
    Fouedjio, Francky
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2020, 12 (05):
  • [26] Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm
    Christophe Biernacki
    Julien Jacques
    [J]. Statistics and Computing, 2016, 26 : 929 - 943
  • [27] Penalized Model-Based Clustering with Group-Dependent Shrinkage Estimation
    Casa, Alessandro
    Cappozzo, Andrea
    Fop, Michael
    [J]. BUILDING BRIDGES BETWEEN SOFT AND STATISTICAL METHODOLOGIES FOR DATA SCIENCE, 2023, 1433 : 73 - 78
  • [28] Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering
    Alessandro Casa
    Andrea Cappozzo
    Michael Fop
    [J]. Journal of Classification, 2022, 39 : 648 - 674
  • [29] Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm
    Biernacki, Christophe
    Jacques, Julien
    [J]. STATISTICS AND COMPUTING, 2016, 26 (05) : 929 - 943
  • [30] Multivariate Wind Turbine Power Curve Model Based on Data Clustering and Polynomial LASSO Regression
    Astolfi, Davide
    Pandit, Ravi
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):