NBLDA: negative binomial linear discriminant analysis for RNA-Seq data

被引:27
|
作者
Dong, Kai [1 ]
Zhao, Hongyu [2 ]
Tong, Tiejun [1 ]
Wan, Xiang [3 ,4 ]
机构
[1] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
[2] Yale Univ, Dept Biostat, New Haven, CT 06510 USA
[3] Hong Kong Baptist Univ, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
[4] Hong Kong Baptist Univ, Inst Computat & Theoret Studies, Kowloon Tong, Hong Kong, Peoples R China
来源
BMC BIOINFORMATICS | 2016年 / 17卷
基金
美国国家卫生研究院; 中国国家自然科学基金;
关键词
RNA-Seq; Negative binomial distribution; Linear discriminant analysis; IDENTIFYING DIFFERENTIAL EXPRESSION; CLASSIFICATION; DISPERSION; TUMORS;
D O I
10.1186/s12859-016-1208-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5: 2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. Results: In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications. Conclusions: We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/similar to xwan/NBLDA. R or https://github.com/yangchadam/NBLDA
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Statistical analysis of RNA-seq data at scale
    Leek, Jeff T.
    [J]. GENETIC EPIDEMIOLOGY, 2015, 39 (07) : 563 - 563
  • [22] RseqFlow: workflows for RNA-Seq data analysis
    Wang, Ying
    Mehta, Gaurang
    Mayani, Rajiv
    Lu, Jingxi
    Souaiaia, Tade
    Chen, Yangho
    Clark, Andrew
    Yoon, Hee Jae
    Wan, Lin
    Evgrafov, Oleg V.
    Knowles, James A.
    Deelman, Ewa
    Chen, Ting
    [J]. BIOINFORMATICS, 2011, 27 (18) : 2598 - 2600
  • [23] A Comprehensive Review on RNA-seq Data Analysis
    Zhang Li
    Liu Xuejun
    [J]. Transactions of Nanjing University of Aeronautics and Astronautics, 2016, 33 (03) : 339 - 361
  • [24] Parametric analysis of RNA-seq expression data
    Konishi, Tomokazu
    [J]. GENES TO CELLS, 2016, 21 (06) : 639 - 647
  • [25] RNA-Seq UD: A bioinformatics plattform for RNA-Seq analysis
    Ramirez, Miguel
    Alejandro Rojas-Quintero, Cristian
    Enrique Vera-Parra, Nelson
    [J]. 2015 10TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2015,
  • [26] Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference
    Di, Yanming
    [J]. STATISTICS AND ITS INTERFACE, 2015, 8 (04) : 405 - 418
  • [27] Statistical Issues in the Analysis of ChIP-Seq and RNA-Seq Data
    Ghosh, Debashis
    Qin, Zhaohui S.
    [J]. GENES, 2010, 1 (02) : 317 - 334
  • [28] Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size
    Yu, Danni
    Huber, Wolfgang
    Vitek, Olga
    [J]. BIOINFORMATICS, 2013, 29 (10) : 1275 - 1282
  • [29] Oqtans: a multifunctional workbench for RNA-seq data analysis
    Sreedharan, Vipin T.
    Schultheiss, Sebastian J.
    Jean, Geraldine
    Kahles, Andre
    Bohnert, Regina
    Drewe, Philipp
    Mudrakarta, Pramod
    Goernitz, Nico
    Zeller, Georg
    Raetsch, Gunnar
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [30] Differential expression analysis for paired RNA-seq data
    Chung, Lisa M.
    Ferguson, John P.
    Zheng, Wei
    Qian, Feng
    Bruno, Vincent
    Montgomery, Ruth R.
    Zhao, Hongyu
    [J]. BMC BIOINFORMATICS, 2013, 14 : 110