Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering

被引：0

作者：

Chandra, Noirrit Kiran ^{[1
]}

Canale, Antonio ^{[2
]}

Dunson, David B. ^{[3
]}

机构：

[1] Univ Texas Dallas, Dept Math Sci, Richardson, TX 75080 USA

[2] Univ Padua, Dept Stat Sci, Padua, Italy

[3] Dept Stat Sci, Durham, NC USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

关键词：

Big data; Clustering; Dirichlet process; Exchangeable partition probability function; High dimensional; Latent variables; Mixture model; COVARIANCE-MATRIX ESTIMATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian mixture models are widely used for clustering of high-dimensional data with appropri-ate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and in-creasing data dimensionality. We provide conditions under which the finite sample posterior tends to either assign every observation to a different cluster or all observations to the same cluster as the dimension grows. Interestingly, the conditions do not depend on the choice of clustering prior, as long as all possible partitions of observations into clusters have positive prior probabilities, and hold irrespective of the true data-generating model. We then propose a class of latent mixtures for Bayesian clustering (Lamb) on a set of low-dimensional latent variables inducing a partition on the observed data. The model is amenable to scalable posterior inference and we show that it can avoid the pitfalls of high-dimensionality under mild assumptions. The proposed approach is shown to have good performance in simulation studies and an application to inferring cell types based on scRNAseq.

引用

页数：42

共 50 条

[41] Simultaneous Clustering and Dimensionality Reduction Using Variational Bayesian Mixture Model
Watanabe, Kazuho
Akaho, Shotaro
Omachi, Shinichiro
Okada, Masato
CLASSIFICATION AS A TOOL FOR RESEARCH, 2010, : 81 - 89
[42] Overcoming the Curse of Dimensionality When Clustering Multivariate Volume Data
Molchanov, Vladimir
Linsen, Lars
VISIGRAPP 2018: PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS / INTERNATIONAL CONFERENCE ON INFORMATION VISUALIZATION THEORY AND APPLICATIONS (IVAPP), VOL 3, 2018, : 29 - 39
[43] Fighting the curse of dimensionality with local model networks
Belz, Julian
AT-AUTOMATISIERUNGSTECHNIK, 2019, 67 (10) : 889 - 890
[44] Escaping the curse of dimensionality in similarity learning: Efficient Frank-Wolfe algorithm and generalization bounds
Liu, Kuan
Bellet, Aurelien
NEUROCOMPUTING, 2019, 333 : 185 - 199
[45] Probability of misclassification in model-based clustering
Xuwen Zhu
Computational Statistics, 2019, 34 : 1427 - 1442
[46] Model-based clustering for random hypergraphs
Tin Lok James Ng
Thomas Brendan Murphy
Advances in Data Analysis and Classification, 2022, 16 : 691 - 723
[47] Model-based clustering for populations of networks
Signorelli, Mirko
Wit, Ernst C.
STATISTICAL MODELLING, 2020, 20 (01) : 9 - 29
[48] Model-based clustering of longitudinal data
McNicholas, Paul D.
Murphy, T. Brendan
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2010, 38 (01): : 153 - 168
[49] Boosting for model-based data clustering
Saffari, Amir
Bischof, Horst
PATTERN RECOGNITION, 2008, 5096 : 51 - 60
[50] Dimension reduction for model-based clustering
Luca Scrucca
Statistics and Computing, 2010, 20 : 471 - 484

← 1 2 3 4 5 →