Biological assessment of robust noise models in microarray data analysis

被引:23
|
作者
Posekany, A. [1 ]
Felsenstein, K. [2 ]
Sykacek, P. [1 ]
机构
[1] Univ Nat Resources & Life Sci, Dept Biotechnol, Chair Bioinformat, A-1180 Vienna, Austria
[2] Vienna Univ Technol, Dept Stat, A-1040 Vienna, Austria
关键词
DIFFERENTIALLY EXPRESSED GENES; NONPARAMETRIC METHODS; NORMALIZATION; TRANSCRIPTOME; MECHANISMS; ONTOLOGY; OBESITY; MICE; TOOL;
D O I
10.1093/bioinformatics/btr018
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Although several recently proposed analysis packages for microarray data can cope with heavy-tailed noise, many applications rely on Gaussian assumptions. Gaussian noise models foster computational efficiency. This comes, however, at the expense of increased sensitivity to outlying observations. Assessing potential insufficiencies of Gaussian noise in microarray data analysis is thus important and of general interest. Results: We propose to this end assessing different noise models on a large number of microarray experiments. The goodness of fit of noise models is quantified by a hierarchical Bayesian analysis of variance model, which predicts normalized expression values as a mixture of a Gaussian density and t-distributions with adjustable degrees of freedom. Inference of differentially expressed genes is taken into consideration at a second mixing level. For attaining far reaching validity, our investigations cover a wide range of analysis platforms and experimental settings. As the most striking result, we find irrespective of the chosen preprocessing and normalization method in all experiments that a heavy-tailed noise model is a better fit than a simple Gaussian. Further investigations revealed that an appropriate choice of noise model has a considerable influence on biological interpretations drawn at the level of inferred genes and gene ontology terms. We conclude from our investigation that neglecting the over dispersed noise in microarray data can mislead scientific discovery and suggest that the convenience of Gaussian-based modelling should be replaced by non-parametric approaches or other methods that account for heavy-tailed noise.
引用
收藏
页码:807 / 814
页数:8
相关论文
共 50 条
  • [21] Noise reduction in microarray gene expression data based on spectral analysis
    Tang, Vivian T. Y.
    Yan, Hong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2012, 3 (01) : 51 - 57
  • [22] Robust Data Worth Analysis with Surrogate Models
    Gosses, Moritz
    Wohling, Thomas
    GROUNDWATER, 2021, 59 (05) : 728 - 744
  • [23] Linear models for microarray data analysis: Hidden similarities and differences
    Kerr, MK
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) : 891 - 901
  • [24] Empirical Bayes analysis of variance component models for microarray data
    S. Feng
    R. D. Wolfinger
    T. M. Chu
    G. C. Gibson
    L. A. McGraw
    Journal of Agricultural, Biological, and Environmental Statistics, 2006, 11
  • [25] Empirical Bayes analysis of variance component models for microarray data
    Feng, S.
    Wolfinger, R. D.
    Chu, T. M.
    Gibson, G. C.
    McGraw, L. A.
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2006, 11 (02) : 197 - 209
  • [26] Noise robust discriminative models
    Le, Q
    Bengio, S
    Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, Vols 1and 2, 2004, : 375 - 378
  • [27] DATE Analysis: A General Theory of Biological Change Applied to Microarray Data
    Rasnick, David
    BIOTECHNOLOGY PROGRESS, 2009, 25 (05) : 1275 - 1288
  • [28] Assessment of gene set analysis methods based on microarray data
    Alavi-Majd, Hamid
    Khodakarim, Soheila
    Zayeri, Farid
    Rezaei-Tavirani, Mostafa
    Tabatabaei, Seyyed Mohammad
    Heydarpour-Meymeh, Maryam
    GENE, 2014, 534 (02) : 383 - 389
  • [29] Hidden Markov models for microarray time course data in multiple biological conditions - Discussion
    Li, Hongzhe
    Hong, Fangxin
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (476) : 1332 - 1334
  • [30] Topological Data Analysis of Biological Aggregation Models
    Topaz, Chad M.
    Ziegelmeier, Lori
    Halverson, Tom
    PLOS ONE, 2015, 10 (05):