Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements

被引:44
|
作者
Cooke, Emma J. [2 ]
Savage, Richard S. [1 ]
Kirk, Paul D. W. [1 ]
Darkins, Robert [1 ]
Wild, David L. [1 ]
机构
[1] Univ Warwick, Syst Biol Ctr, Coventry CV4 7AL, W Midlands, England
[2] Univ Warwick, Dept Chem, Coventry CV4 7AL, W Midlands, England
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
英国工程与自然科学研究理事会;
关键词
GENE-EXPRESSION; MIXTURE MODEL;
D O I
10.1186/1471-2105-12-399
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques. Results: We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles. Conclusions: By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements
    Emma J Cooke
    Richard S Savage
    Paul DW Kirk
    Robert Darkins
    David L Wild
    [J]. BMC Bioinformatics, 12
  • [2] Hierarchical Signature Clustering for Time Series Microarray Data
    Koenig, Lars
    Youn, Eunseog
    [J]. SOFTWARE TOOLS AND ALGORITHMS FOR BIOLOGICAL SYSTEMS, 2011, 696 : 57 - 65
  • [3] Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm
    Darkins, Robert
    Cooke, Emma J.
    Ghahramani, Zoubin
    Kirk, Paul D. W.
    Wild, David L.
    Savage, Richard S.
    [J]. PLOS ONE, 2013, 8 (04):
  • [4] R/BHC: fast Bayesian hierarchical clustering for microarray data
    Richard S Savage
    Katherine Heller
    Yang Xu
    Zoubin Ghahramani
    William M Truman
    Murray Grant
    Katherine J Denby
    David L Wild
    [J]. BMC Bioinformatics, 10
  • [5] R/BHC: fast Bayesian hierarchical clustering for microarray data
    Savage, Richard S.
    Heller, Katherine
    Xu, Yang
    Ghahramani, Zoubin
    Truman, William M.
    Grant, Murray
    Denby, Katherine J.
    Wild, David L.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [6] Difference-based clustering of short time-course microarray data with replicates
    Jihoon Kim
    Ju Han Kim
    [J]. BMC Bioinformatics, 8
  • [7] Difference-based clustering of short time-course microarray data with replicates
    Kim, Jihoon
    Kim, Ju Han
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [8] Application of Agglomerative Hierarchical Clustering for Clustering of Time Series Data
    Radovanovic, Ana
    Li, Junshi
    Milanovic, Jovica, V
    Milosavljevic, Nina
    Storchi, Riccardo
    [J]. 2020 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT-EUROPE 2020): SMART GRIDS: KEY ENABLERS OF A GREEN POWER SYSTEM, 2020, : 640 - 644
  • [9] Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters
    Hensman, James
    Lawrence, Neil D.
    Rattray, Magnus
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [10] Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters
    James Hensman
    Neil D Lawrence
    Magnus Rattray
    [J]. BMC Bioinformatics, 14