Bayesian Correlation Analysis for Sequence Count Data

被引:6
|
作者
Sanchez-Taltavull, Daniel [1 ,2 ]
Ramachandran, Parameswaran [1 ,2 ]
Lau, Nelson [1 ]
Perkins, Theodore J. [1 ,2 ]
机构
[1] Ottawa Hosp Res Inst, Regenerat Med Program, Ottawa, ON, Canada
[2] Univ Ottawa, Dept Biochem Microbiol & Immunol, Ottawa, ON, Canada
来源
PLOS ONE | 2016年 / 11卷 / 10期
基金
加拿大自然科学与工程研究理事会;
关键词
EXPRESSION; CANCER;
D O I
10.1371/journal.pone.0163595
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low D especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities' signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] A Bayesian analysis of frequency count data
    Font, M.
    Puig, X.
    Ginebra, J.
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2013, 83 (02) : 229 - 246
  • [2] Bayesian analysis of the differences of count data
    Karlis, D
    Ntzoufras, I
    [J]. STATISTICS IN MEDICINE, 2006, 25 (11) : 1885 - 1905
  • [3] Empirical Bayesian Selection of Hypothesis Testing Procedures for Analysis of Sequence Count Expression Data
    Pounds, Stanley B.
    Gao, Cuilan L.
    Zhang, Hui
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2012, 11 (05)
  • [4] Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data
    Li, Qiwei
    Cassese, Alberto
    Guindani, Michele
    Vannucci, Marina
    [J]. BIOMETRICS, 2019, 75 (01) : 183 - 192
  • [5] Sequential Bayesian Analysis of Multivariate Count Data
    Aktekin, Tevfik
    Polson, Nick
    Soyer, Refik
    [J]. BAYESIAN ANALYSIS, 2018, 13 (02): : 385 - 409
  • [6] Differential expression analysis for sequence count data
    Anders, Simon
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2010, 11 (10):
  • [7] Differential expression analysis for sequence count data
    Simon Anders
    Wolfgang Huber
    [J]. Genome Biology, 11
  • [8] Bayesian analysis of econometric models for count data: A survey
    Winkelmann, R
    [J]. EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 204 - 215
  • [9] Analysis of longitudinal count data with serial correlation
    Xu, Stanley
    Jones, Richard H.
    Grunwald, Gary K.
    [J]. BIOMETRICAL JOURNAL, 2007, 49 (03) : 416 - 428
  • [10] baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data
    Thomas J Hardcastle
    Krystyna A Kelly
    [J]. BMC Bioinformatics, 11