Overrating Bayesian mixtures of factor analyzers with an unknown number of components

被引:8
|
作者
Papastamoulis, Panagiotis [1 ]
机构
[1] Univ Manchester, Fac Biol Med & Hlth, Div Informat Imaging & Data Sci, Michael Smith Bldg,Oxford Rd, Manchester M13 9PL, Lancs, England
关键词
Factor analysis; Mixture models; Clustering; MCMC; CHAIN MONTE-CARLO; LABEL SWITCHING PROBLEM; MAXIMUM-LIKELIHOOD; R PACKAGE; MODEL; DISTRIBUTIONS; DEVIANCE; CRITERIA; MCMC;
D O I
10.1016/j.csda.2018.03.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent advances on overfitting Bayesian mixture models provide a solid and straightforward approach for inferring the underlying number of clusters and model parameters in heterogeneous datasets. The applicability of such a framework in clustering correlated high dimensional data is demonstrated. For this purpose an overfitting mixture of factor analyzers is introduced, assuming that the number of factors is fixed. A Markov chain Monte Carlo (MCMC) sampler combined with a prior parallel tempering scheme is used to estimate the posterior distribution of model parameters. The optimal number of factors is estimated using information criteria. Identifiability issues related to the label switching problem are dealt by post-processing the simulated MCMC sample by relabeling algorithms. The method is benchmarked against state-of-the-art software for maximum likelihood estimation of mixtures of factor analyzers using an extensive simulation study. Finally, the applicability of the method is illustrated in publicly available data. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:220 / 234
页数:15
相关论文
共 50 条
  • [1] On Bayesian analysis of mixtures with an unknown number of components
    Richardson, S
    Green, PJ
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1997, 59 (04): : 731 - 758
  • [2] On Bayesian analysis of mixtures with an unknown number of components - Discussion
    Robert, CP
    Aitkin, M
    Cox, DR
    Stephens, M
    Polymenis, A
    Gilks, WR
    Nobile, A
    Hodgson, M
    OHagan, A
    Longford, NT
    Dawid, AP
    Atkinson, AC
    Bernardo, JM
    Besag, J
    Brooks, SP
    Byers, S
    Raftery, A
    Celeux, G
    Cheng, RCH
    Liu, WB
    Chien, YH
    George, EI
    Cressie, N
    Huang, HC
    Gruet, MA
    Heath, SC
    Jennison, C
    Lawson, AB
    Clark, A
    McLachlan, G
    Peel, D
    Mengersen, K
    George, A
    Philippe, A
    Roeder, K
    Wasserman, L
    Schlattmann, P
    Bohning, D
    Titterington, DM
    Tong, H
    West, M
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1997, 59 (04) : 758 - 792
  • [3] Bayesian multivariate Poisson mixtures with an unknown number of components
    Meligkotsidou, Loukia
    [J]. STATISTICS AND COMPUTING, 2007, 17 (02) : 93 - 107
  • [4] Bayesian multivariate Poisson mixtures with an unknown number of components
    Loukia Meligkotsidou
    [J]. Statistics and Computing, 2007, 17 : 93 - 107
  • [5] Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components
    Papastamoulis, Panagiotis
    [J]. STATISTICS AND COMPUTING, 2020, 30 (03) : 485 - 506
  • [6] Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components
    Panagiotis Papastamoulis
    [J]. Statistics and Computing, 2020, 30 : 485 - 506
  • [7] Bayesian finite mixtures with an unknown number of components: The allocation sampler
    Agostino Nobile
    Alastair T. Fearnside
    [J]. Statistics and Computing, 2007, 17 : 147 - 162
  • [8] Bayesian finite mixtures with an unknown number of components: The allocation sampler
    Nobile, Agostino
    Fearnside, Alastair T.
    [J]. STATISTICS AND COMPUTING, 2007, 17 (02) : 147 - 162
  • [9] Bayesian analysis of mixtures of factor analyzers
    Utsugi, A
    Kumagai, T
    [J]. NEURAL COMPUTATION, 2001, 13 (05) : 993 - 1002
  • [10] Bayesian model selection for mixtures of structural equation models with an unknown number of components
    Lee, SY
    Song, XY
    [J]. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2003, 56 : 145 - 165