Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering

被引:0
|
作者
Chandra, Noirrit Kiran [1 ]
Canale, Antonio [2 ]
Dunson, David B. [3 ]
机构
[1] Univ Texas Dallas, Dept Math Sci, Richardson, TX 75080 USA
[2] Univ Padua, Dept Stat Sci, Padua, Italy
[3] Dept Stat Sci, Durham, NC USA
关键词
Big data; Clustering; Dirichlet process; Exchangeable partition probability function; High dimensional; Latent variables; Mixture model; COVARIANCE-MATRIX ESTIMATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian mixture models are widely used for clustering of high-dimensional data with appropri-ate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and in-creasing data dimensionality. We provide conditions under which the finite sample posterior tends to either assign every observation to a different cluster or all observations to the same cluster as the dimension grows. Interestingly, the conditions do not depend on the choice of clustering prior, as long as all possible partitions of observations into clusters have positive prior probabilities, and hold irrespective of the true data-generating model. We then propose a class of latent mixtures for Bayesian clustering (Lamb) on a set of low-dimensional latent variables inducing a partition on the observed data. The model is amenable to scalable posterior inference and we show that it can avoid the pitfalls of high-dimensionality under mild assumptions. The proposed approach is shown to have good performance in simulation studies and an application to inferring cell types based on scRNAseq.
引用
收藏
页数:42
相关论文
共 50 条
  • [1] Breaking the curse of dimensionality: hierarchical Bayesian network model for multi-view clustering
    Hasna Njah
    Salma Jamoussi
    Walid Mahdi
    Annals of Mathematics and Artificial Intelligence, 2021, 89 : 1013 - 1033
  • [2] Breaking the curse of dimensionality: hierarchical Bayesian network model for multi-view clustering
    Njah, Hasna
    Jamoussi, Salma
    Mahdi, Walid
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2021, 89 (10-11) : 1013 - 1033
  • [3] Bayesian model-based clustering procedures
    Lau, John W.
    Green, Peter J.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2007, 16 (03) : 526 - 558
  • [4] Model-based Bayesian clustering (MBBC)
    Joo, Yongsung
    Booth, James G.
    Namkoong, Younghwan
    Casella, George
    BIOINFORMATICS, 2008, 24 (06) : 874 - 875
  • [5] Parsimonious Bayesian model-based clustering with dissimilarities
    Morrissette, Samuel
    Muthukumarana, Saman
    Turgeon, Maxime
    MACHINE LEARNING WITH APPLICATIONS, 2024, 15
  • [6] Model-based clustering with dissimilarities: A Bayesian approach
    Oh, Man-Suk
    Raftery, Adrian E.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2007, 16 (03) : 559 - 585
  • [7] Regression with comparisons: Escaping the curse of dimensionality with ordinal information
    Xu, Yichong
    Balakrishnan, Sivaraman
    Singh, Aarti
    Dubrawski, Artur
    Journal of Machine Learning Research, 2020, 21
  • [8] Escaping the Curse of Dimensionality in Estimating Multivariate Transfer Entropy
    Runge, Jakob
    Heitzig, Jobst
    Petoukhov, Vladimir
    Kurths, Juergen
    PHYSICAL REVIEW LETTERS, 2012, 108 (25)
  • [9] Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information
    Xu, Yichong
    Balakrishnan, Sivaraman
    Singh, Aarti
    Dubrawski, Artur
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [10] Model-based Clustering with Noise: Bayesian Inference and Estimation
    H. Bensmail
    J. J. Meulman
    Journal of Classification, 2003, 20 : 049 - 076