Including Probe-Level Measurement Error in Robust Mixture Clustering of Replicated Microarray Gene Expression

被引:3
|
作者
Liu, Xuejun [1 ]
Rattray, Magnus [2 ,3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Univ Sheffield, Dept Comp Sci, Sheffield S10 2TN, S Yorkshire, England
[3] Univ Sheffield, Sheffield Inst Translat Neurosci, Sheffield S10 2TN, S Yorkshire, England
基金
美国国家科学基金会;
关键词
microarray data; gene expression clustering; mixture models; PROBABILISTIC MODEL; CYCLE; IDENTIFICATION; TRANSCRIPTION; YEAST;
D O I
10.2202/1544-6115.1600
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Probabilistic mixture models provide a popular approach to cluster noisy gene expression data for exploring gene function. Since gene expression data obtained from microarray experiments are often associated with significant sources of technical and biological noise, replicated experiments are typically used to deal with data variability, and internal replication (e.g. from multiple probes per gene in an experiment) provides valuable information about technical sources of noise. However, current implementations of mixture models either do not consider the correlation between the replicated measurements for the same experimental condition, or ignore the probe-level measurement error, and thus overlook the rich information about technical noise. Moreover, most current methods use non-robust Gaussian components to describe the data, and these methods are therefore sensitive to non-Gaussian clusters and outliers. In many cases, this will lead to over-estimation of the number of model components as multiple Gaussian components are used to fit a non-Gaussian cluster. We propose a robust Student's t-mixture model, which explicitly handles replicated gene expression data, includes the consideration of probe-level measurement error when available and automatically selects the appropriate number of model components using a minimum message length criterion. We apply the model to gene expression data using probe-level measurements from an Affymetrix probe-level model, multi-mgMOS, which provides uncertainty estimates. The proposed Student's t-mixture model shows robust performance on synthetic data sets with realistic noise characteristics in comparison to a standard Gaussian mixture model and two other previously published methods. We also compare performance with these methods on two yeast time-course data sets and show that the new method obtains more biologically meaningful clusters in terms of enrichment statistics for GO categories and interactions between transcription factors and genes. Automatically selecting the number of components is more computationally efficient than using a model selection approach and allows the methods to be applied to larger data sets.
引用
收藏
页数:23
相关论文
共 16 条
  • [1] Including probe-level uncertainty in model-based gene expression clustering
    Xuejun Liu
    Kevin K Lin
    Bogi Andersen
    Magnus Rattray
    BMC Bioinformatics, 8
  • [2] Including probe-level uncertainty in model-based gene expression clustering
    Liu, Xuejun
    Lin, Kevin K.
    Andersen, Bogi
    Rattray, Magnus
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [3] Probe-level measurement error improves accuracy in detecting differential gene expression
    Liu, Xuejun
    Milo, Marta
    Lawrence, Neil D.
    Rattray, Magnus
    BIOINFORMATICS, 2006, 22 (17) : 2107 - 2113
  • [4] Hierarchical Clustering of Microarray Data with Probe-level Uncertainty
    Gullo, F.
    Ponti, G.
    Tagarelli, A.
    Tradigo, G.
    Veltri, P.
    2009 22ND IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, 2009, : 357 - +
  • [5] Robust Bayesian Clustering for Replicated Gene Expression Data
    Sun, Jianyong
    Garibaldi, Jonathan M.
    Kenobi, Kim
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (05) : 1504 - 1514
  • [6] Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression
    Sébastien Lemieux
    BMC Bioinformatics, 7
  • [8] A robust method for estimating gene expression states using Affymetrix microarray probe level data
    Megu Ohtaki
    Keiko Otani
    Keiko Hiyama
    Naomi Kamei
    Kenichi Satoh
    Eiso Hiyama
    BMC Bioinformatics, 11
  • [9] A robust method for estimating gene expression states using Affymetrix microarray probe level data
    Ohtaki, Megu
    Otani, Keiko
    Hiyama, Keiko
    Kamei, Naomi
    Satoh, Kenichi
    Hiyama, Eiso
    BMC BIOINFORMATICS, 2010, 11
  • [10] A comparison of probe-level and probeset models for small-sample gene expression data
    Stevens, John R.
    Bell, Jason L.
    Aston, Kenneth I.
    White, Kenneth L.
    BMC BIOINFORMATICS, 2010, 11