Replica analysis of Bayesian data clustering

被引:0
|
作者
Mozeika, Alexander [1 ]
Coolen, Anthony C. C. [2 ,3 ]
机构
[1] Kings Coll London, Inst Math & Mol Biomed, Hodgkin Bldg, London SE1 1UL, England
[2] Kings Coll London, Dept Math, London WC2R 2LS, England
[3] London Inst Math Sci, 35A South St, London W1K 2XF, England
基金
英国医学研究理事会;
关键词
clustering; Bayesian inference; replica; STATISTICAL-MECHANICS; CLASSIFICATION;
D O I
10.1088/1751-8121/ab59af
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We use statistical mechanics to study model-based Bayesian data clustering. In this approach, each partition of the data into clusters is regarded as a microscopic system state, the negative data log-likelihood gives the energy of each state, and the data set realisation acts as disorder. Optimal clustering corresponds to the ground state of the system, and is hence obtained from the free energy via a low 'temperature' limit. We assume that for large sample sizes the free energy density is self-averaging, and we use the replica method to compute the asymptotic free energy density. The main order parameter in the resulting (replica symmetric) theory, the distribution of the data over the clusters, satisfies a self-consistent equation which can be solved by a population dynamics algorithm. From this order parameter one computes the average free energy, and all relevant macroscopic characteristics of the problem. The theory describes numerical experiments perfectly, and gives a significant improvement over the mean-field theory that was used to study this model in past.
引用
收藏
页数:32
相关论文
共 50 条
  • [21] Consensus Big Data Clustering for Bayesian Mixture Models
    Karras, Christos
    Karras, Aristeidis
    Giotopoulos, Konstantinos C.
    Avlonitis, Markos
    Sioutas, Spyros
    ALGORITHMS, 2023, 16 (05)
  • [22] Bayesian Non-Parametric Clustering of Ranking Data
    Meila, Marina
    Chen, Harr
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (11) : 2156 - 2169
  • [23] Bayesian Clustering of Functional Data Using Local Features
    Suarez, Adam Justin
    Ghosal, Subhashis
    BAYESIAN ANALYSIS, 2016, 11 (01): : 71 - 98
  • [24] A Bayesian Nonparametric Model for Integrative Clustering of Omics Data
    Peneva, Iliana
    Savage, Richard S.
    BAYESIAN STATISTICS AND NEW GENERATIONS, BAYSM 2018, 2019, 296 : 105 - 114
  • [25] Robust Bayesian Clustering for Replicated Gene Expression Data
    Sun, Jianyong
    Garibaldi, Jonathan M.
    Kenobi, Kim
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (05) : 1504 - 1514
  • [26] Bayesian Data Analysis
    Hewson, Paul
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2015, 178 (01) : 301 - 301
  • [27] Bayesian data analysis
    Black, TC
    Thompson, WJ
    COMPUTING IN SCIENCE & ENGINEERING, 2001, 3 (04) : 86 - 91
  • [28] Bayesian data analysis
    Kruschke, John K.
    WILEY INTERDISCIPLINARY REVIEWS-COGNITIVE SCIENCE, 2010, 1 (05) : 658 - 676
  • [29] Bayesian data analysis
    Black, Timothy C.
    Thompson, William J.
    Computing in Science and Engineering, 2001, 3 (04): : 86 - 91
  • [30] Bayesian clustering of distributions in stochastic frontier analysis
    Griffin, J. E.
    JOURNAL OF PRODUCTIVITY ANALYSIS, 2011, 36 (03) : 275 - 283