Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering

被引:0
|
作者
Chandra, Noirrit Kiran [1 ]
Canale, Antonio [2 ]
Dunson, David B. [3 ]
机构
[1] Univ Texas Dallas, Dept Math Sci, Richardson, TX 75080 USA
[2] Univ Padua, Dept Stat Sci, Padua, Italy
[3] Dept Stat Sci, Durham, NC USA
关键词
Big data; Clustering; Dirichlet process; Exchangeable partition probability function; High dimensional; Latent variables; Mixture model; COVARIANCE-MATRIX ESTIMATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian mixture models are widely used for clustering of high-dimensional data with appropri-ate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and in-creasing data dimensionality. We provide conditions under which the finite sample posterior tends to either assign every observation to a different cluster or all observations to the same cluster as the dimension grows. Interestingly, the conditions do not depend on the choice of clustering prior, as long as all possible partitions of observations into clusters have positive prior probabilities, and hold irrespective of the true data-generating model. We then propose a class of latent mixtures for Bayesian clustering (Lamb) on a set of low-dimensional latent variables inducing a partition on the observed data. The model is amenable to scalable posterior inference and we show that it can avoid the pitfalls of high-dimensionality under mild assumptions. The proposed approach is shown to have good performance in simulation studies and an application to inferring cell types based on scRNAseq.
引用
收藏
页数:42
相关论文
共 50 条
  • [41] Simultaneous Clustering and Dimensionality Reduction Using Variational Bayesian Mixture Model
    Watanabe, Kazuho
    Akaho, Shotaro
    Omachi, Shinichiro
    Okada, Masato
    CLASSIFICATION AS A TOOL FOR RESEARCH, 2010, : 81 - 89
  • [42] Overcoming the Curse of Dimensionality When Clustering Multivariate Volume Data
    Molchanov, Vladimir
    Linsen, Lars
    VISIGRAPP 2018: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS / INTERNATIONAL CONFERENCE ON INFORMATION VISUALIZATION THEORY AND APPLICATIONS (IVAPP), VOL 3, 2018, : 29 - 39
  • [43] Fighting the curse of dimensionality with local model networks
    Belz, Julian
    AT-AUTOMATISIERUNGSTECHNIK, 2019, 67 (10) : 889 - 890
  • [44] Escaping the curse of dimensionality in similarity learning: Efficient Frank-Wolfe algorithm and generalization bounds
    Liu, Kuan
    Bellet, Aurelien
    NEUROCOMPUTING, 2019, 333 : 185 - 199
  • [45] Probability of misclassification in model-based clustering
    Xuwen Zhu
    Computational Statistics, 2019, 34 : 1427 - 1442
  • [46] Model-based clustering for random hypergraphs
    Tin Lok James Ng
    Thomas Brendan Murphy
    Advances in Data Analysis and Classification, 2022, 16 : 691 - 723
  • [47] Model-based clustering for populations of networks
    Signorelli, Mirko
    Wit, Ernst C.
    STATISTICAL MODELLING, 2020, 20 (01) : 9 - 29
  • [48] Model-based clustering of longitudinal data
    McNicholas, Paul D.
    Murphy, T. Brendan
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (01): : 153 - 168
  • [49] Boosting for model-based data clustering
    Saffari, Amir
    Bischof, Horst
    PATTERN RECOGNITION, 2008, 5096 : 51 - 60
  • [50] Dimension reduction for model-based clustering
    Luca Scrucca
    Statistics and Computing, 2010, 20 : 471 - 484