Model-based clustering of microarray expression data via latent Gaussian mixture models

被引:107
|
作者
McNicholas, Paul D. [1 ]
Murphy, Thomas Brendan [2 ]
机构
[1] Univ Guelph, Dept Math & Stat, Guelph, ON N1G 2W1, Canada
[2] Univ Coll Dublin, Sch Math Sci, Dublin 4, Ireland
基金
加拿大创新基金会; 爱尔兰科学基金会; 加拿大自然科学与工程研究理事会;
关键词
GENE-EXPRESSION; MAXIMUM-LIKELIHOOD; CLASSIFICATION; ALGORITHM;
D O I
10.1093/bioinformatics/btq498
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. Results: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.
引用
收藏
页码:2705 / 2712
页数:8
相关论文
共 50 条
  • [1] A mixture model-based approach to the clustering of microarray expression data
    McLachlan, GJ
    Bean, RW
    Peel, D
    [J]. BIOINFORMATICS, 2002, 18 (03) : 413 - 422
  • [2] Model-based classification using latent Gaussian mixture models
    McNicholas, Paul D.
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (05) : 1175 - 1181
  • [3] Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models
    McNicholas, P. D.
    Murphy, T. B.
    McDaid, A. F.
    Frost, D.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (03) : 711 - 723
  • [4] Mixture of latent trait analyzers for model-based clustering of categorical data
    Gollini, Isabella
    Murphy, Thomas Brendan
    [J]. STATISTICS AND COMPUTING, 2014, 24 (04) : 569 - 588
  • [5] Mixture of latent trait analyzers for model-based clustering of categorical data
    Isabella Gollini
    Thomas Brendan Murphy
    [J]. Statistics and Computing, 2014, 24 : 569 - 588
  • [6] INTEGRATIVE MODEL-BASED CLUSTERING OF MICROARRAY METHYLATION AND EXPRESSION DATA
    Kormaksson, Matthias
    Booth, James G.
    Figueroa, Maria E.
    Melnick, Ari
    [J]. ANNALS OF APPLIED STATISTICS, 2012, 6 (03): : 1327 - 1347
  • [7] Finite mixture models and model-based clusteringFinite mixture models and model-based clustering
    Melnykov, Volodymyr
    Maitra, Ranjan
    [J]. STATISTICS SURVEYS, 2010, 4 : 80 - 116
  • [8] Gaussian mixture clustering and imputation of microarray data
    Ouyang, M
    Welsh, WJ
    Georgopoulos, P
    [J]. BIOINFORMATICS, 2004, 20 (06) : 917 - 923
  • [9] The natural course of atopic dermatitis - Model-based clustering by latent class mixture models
    Diepgen, TL
    Kuss, O
    Gromann, C
    [J]. JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2005, 125 (04) : 853 - 853
  • [10] A new clustering method of gene expression data based on multivariate Gaussian mixture models
    Liu, Zhe
    Song, Yu-qing
    Xie, Cong-hua
    Tang, Zheng
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (02) : 359 - 368