Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering

被引：0

作者：

Chandra, Noirrit Kiran ^{[1
]}

Canale, Antonio ^{[2
]}

Dunson, David B. ^{[3
]}

机构：

[1] Univ Texas Dallas, Dept Math Sci, Richardson, TX 75080 USA

[2] Univ Padua, Dept Stat Sci, Padua, Italy

[3] Dept Stat Sci, Durham, NC USA

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

关键词：

Big data; Clustering; Dirichlet process; Exchangeable partition probability function; High dimensional; Latent variables; Mixture model; COVARIANCE-MATRIX ESTIMATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian mixture models are widely used for clustering of high-dimensional data with appropri-ate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and in-creasing data dimensionality. We provide conditions under which the finite sample posterior tends to either assign every observation to a different cluster or all observations to the same cluster as the dimension grows. Interestingly, the conditions do not depend on the choice of clustering prior, as long as all possible partitions of observations into clusters have positive prior probabilities, and hold irrespective of the true data-generating model. We then propose a class of latent mixtures for Bayesian clustering (Lamb) on a set of low-dimensional latent variables inducing a partition on the observed data. The model is amenable to scalable posterior inference and we show that it can avoid the pitfalls of high-dimensionality under mild assumptions. The proposed approach is shown to have good performance in simulation studies and an application to inferring cell types based on scRNAseq.

引用

页数：42

共 50 条

[21] Bayesian model-based tight clustering for time course data
Yongsung Joo
George Casella
James Hobert
Computational Statistics, 2010, 25 : 17 - 38
[22] A Bayesian approach to model-based clustering for binary panel probit models
Assmann, Christian
Boysen-Hogrefe, Jens
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (01) : 261 - 279
[23] Improved model-based clustering performance using Bayesian initialization averaging
Adrian O’Hagan
Arthur White
Computational Statistics, 2019, 34 : 201 - 231
[24] Improved model-based clustering performance using Bayesian initialization averaging
O'Hagan, Adrian
White, Arthur
COMPUTATIONAL STATISTICS, 2019, 34 (01) : 201 - 231
[25] Model-Based Clustering
Paul D. McNicholas
Journal of Classification, 2016, 33 : 331 - 373
[26] Model-Based Clustering
Gormley, Isobel Claire
Murphy, Thomas Brendan
Raftery, Adrian E.
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2023, 10 : 573 - 595
[27] Model-Based Clustering
McNicholas, Paul D.
JOURNAL OF CLASSIFICATION, 2016, 33 (03) : 331 - 373
[28] The curse of dimensionality (COD), misclassified DMUs, and Bayesian DEA
Unsal, Mehmet Guray
Friesner, Daniel
Rosenman, Robert
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (08) : 4186 - 4203
[29] Overcoming the curse of dimensionality in clustering by means of the wavelet transform
Murtagh, Fionn
Starck, Jean-Luc
Berry, Michael W.
1600, Oxford Univ Press, Oxford, United Kingdom (43):
[30] Overcoming the curse of dimensionality in clustering by means of the wavelet transform
Murtagh, F
Starck, JL
Berry, MW
COMPUTER JOURNAL, 2000, 43 (02): : 107 - 120

← 1 2 3 4 5 →