Model-based clustering of microarray expression data via latent Gaussian mixture models

被引:107
|
作者
McNicholas, Paul D. [1 ]
Murphy, Thomas Brendan [2 ]
机构
[1] Univ Guelph, Dept Math & Stat, Guelph, ON N1G 2W1, Canada
[2] Univ Coll Dublin, Sch Math Sci, Dublin 4, Ireland
基金
加拿大创新基金会; 爱尔兰科学基金会; 加拿大自然科学与工程研究理事会;
关键词
GENE-EXPRESSION; MAXIMUM-LIKELIHOOD; CLASSIFICATION; ALGORITHM;
D O I
10.1093/bioinformatics/btq498
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. Results: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.
引用
收藏
页码:2705 / 2712
页数:8
相关论文
共 50 条
  • [21] A mixture model-based approach to the clustering of exponential repeated data
    Martinez, M. J.
    Lavergne, C.
    Trottier, C.
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2009, 100 (09) : 1938 - 1951
  • [22] AN ADAPTIVE SEGMENTATION METHOD BASED ON GAUSSIAN MIXTURE MODEL (GMM) CLUSTERING FOR DNA MICROARRAY
    Parthasarathy, M.
    Ramya, R.
    Vijaya, A.
    [J]. 2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 73 - 77
  • [23] Multivariate data clustering for the Gaussian mixture model
    Kavaliauskas, M
    Rudzkis, R
    [J]. INFORMATICA, 2005, 16 (01) : 61 - 74
  • [24] Gaussian Mixture Model Clustering with Incomplete Data
    Zhang, Yi
    Li, Miaomiao
    Wang, Siwei
    Dai, Sisi
    Luo, Lei
    Zhu, En
    Xu, Huiying
    Zhu, Xinzhong
    Yao, Chaoyun
    Zhou, Haoran
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [25] Model based clustering of audio clips using Gaussian mixture models
    Chandrakala, S.
    Sekhar, C. Chandra
    [J]. ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS, 2009, : 47 - 50
  • [26] Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition
    Marczyk, Michal
    Jaksik, Roman
    Polanski, Andrzej
    Polanska, Joanna
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [27] Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition
    Michal Marczyk
    Roman Jaksik
    Andrzej Polanski
    Joanna Polanska
    [J]. BMC Bioinformatics, 14
  • [28] Semi-parametric model-based clustering for DNA microarray data
    Han, Bohyung
    Davis, Larry S.
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 324 - +
  • [29] Model-based clustering and data transformations for gene expression data
    Yeung, KY
    Fraley, C
    Murua, A
    Raftery, AE
    Ruzzo, WL
    [J]. BIOINFORMATICS, 2001, 17 (10) : 977 - 987
  • [30] MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING
    BANFIELD, JD
    RAFTERY, AE
    [J]. BIOMETRICS, 1993, 49 (03) : 803 - 821