Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering

被引:0
|
作者
Chandra, Noirrit Kiran [1 ]
Canale, Antonio [2 ]
Dunson, David B. [3 ]
机构
[1] Univ Texas Dallas, Dept Math Sci, Richardson, TX 75080 USA
[2] Univ Padua, Dept Stat Sci, Padua, Italy
[3] Dept Stat Sci, Durham, NC USA
关键词
Big data; Clustering; Dirichlet process; Exchangeable partition probability function; High dimensional; Latent variables; Mixture model; COVARIANCE-MATRIX ESTIMATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian mixture models are widely used for clustering of high-dimensional data with appropri-ate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and in-creasing data dimensionality. We provide conditions under which the finite sample posterior tends to either assign every observation to a different cluster or all observations to the same cluster as the dimension grows. Interestingly, the conditions do not depend on the choice of clustering prior, as long as all possible partitions of observations into clusters have positive prior probabilities, and hold irrespective of the true data-generating model. We then propose a class of latent mixtures for Bayesian clustering (Lamb) on a set of low-dimensional latent variables inducing a partition on the observed data. The model is amenable to scalable posterior inference and we show that it can avoid the pitfalls of high-dimensionality under mild assumptions. The proposed approach is shown to have good performance in simulation studies and an application to inferring cell types based on scRNAseq.
引用
收藏
页数:42
相关论文
共 50 条
  • [21] Bayesian model-based tight clustering for time course data
    Yongsung Joo
    George Casella
    James Hobert
    Computational Statistics, 2010, 25 : 17 - 38
  • [22] A Bayesian approach to model-based clustering for binary panel probit models
    Assmann, Christian
    Boysen-Hogrefe, Jens
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (01) : 261 - 279
  • [23] Improved model-based clustering performance using Bayesian initialization averaging
    Adrian O’Hagan
    Arthur White
    Computational Statistics, 2019, 34 : 201 - 231
  • [24] Improved model-based clustering performance using Bayesian initialization averaging
    O'Hagan, Adrian
    White, Arthur
    COMPUTATIONAL STATISTICS, 2019, 34 (01) : 201 - 231
  • [25] Model-Based Clustering
    Paul D. McNicholas
    Journal of Classification, 2016, 33 : 331 - 373
  • [26] Model-Based Clustering
    Gormley, Isobel Claire
    Murphy, Thomas Brendan
    Raftery, Adrian E.
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2023, 10 : 573 - 595
  • [27] Model-Based Clustering
    McNicholas, Paul D.
    JOURNAL OF CLASSIFICATION, 2016, 33 (03) : 331 - 373
  • [28] The curse of dimensionality (COD), misclassified DMUs, and Bayesian DEA
    Unsal, Mehmet Guray
    Friesner, Daniel
    Rosenman, Robert
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (08) : 4186 - 4203
  • [29] Overcoming the curse of dimensionality in clustering by means of the wavelet transform
    Murtagh, Fionn
    Starck, Jean-Luc
    Berry, Michael W.
    1600, Oxford Univ Press, Oxford, United Kingdom (43):
  • [30] Overcoming the curse of dimensionality in clustering by means of the wavelet transform
    Murtagh, F
    Starck, JL
    Berry, MW
    COMPUTER JOURNAL, 2000, 43 (02): : 107 - 120