NBLDA: negative binomial linear discriminant analysis for RNA-Seq data

被引:27
|
作者
Dong, Kai [1 ]
Zhao, Hongyu [2 ]
Tong, Tiejun [1 ]
Wan, Xiang [3 ,4 ]
机构
[1] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
[2] Yale Univ, Dept Biostat, New Haven, CT 06510 USA
[3] Hong Kong Baptist Univ, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
[4] Hong Kong Baptist Univ, Inst Computat & Theoret Studies, Kowloon Tong, Hong Kong, Peoples R China
来源
BMC BIOINFORMATICS | 2016年 / 17卷
基金
中国国家自然科学基金; 美国国家卫生研究院;
关键词
RNA-Seq; Negative binomial distribution; Linear discriminant analysis; IDENTIFYING DIFFERENTIAL EXPRESSION; CLASSIFICATION; DISPERSION; TUMORS;
D O I
10.1186/s12859-016-1208-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5: 2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. Results: In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications. Conclusions: We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/similar to xwan/NBLDA. R or https://github.com/yangchadam/NBLDA
引用
收藏
页数:10
相关论文
共 50 条
  • [41] The Impact of Normalization Methods on RNA-Seq Data Analysis
    Zyprych-Walczak, J.
    Szabelska, A.
    Handschuh, L.
    Gorczak, K.
    Klamecka, K.
    Figlerowicz, M.
    Siatkowski, I.
    [J]. BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [42] A survey of best practices for RNA-seq data analysis
    Conesa, Ana
    Madrigal, Pedro
    Tarazona, Sonia
    Gomez-Cabrero, David
    Cervera, Alejandra
    McPherson, Andrew
    Szczesniak, Michal Wojciech
    Gaffney, Daniel J.
    Elo, Laura L.
    Zhang, Xuegong
    Mortazavi, Ali
    [J]. GENOME BIOLOGY, 2016, 17
  • [43] sRNAflow: A Tool for the Analysis of Small RNA-Seq Data
    Zayakin, Pawel
    [J]. NON-CODING RNA, 2024, 10 (01)
  • [44] Differential expression analysis for paired RNA-seq data
    Lisa M Chung
    John P Ferguson
    Wei Zheng
    Feng Qian
    Vincent Bruno
    Ruth R Montgomery
    Hongyu Zhao
    [J]. BMC Bioinformatics, 14
  • [45] A survey of best practices for RNA-seq data analysis
    Ana Conesa
    Pedro Madrigal
    Sonia Tarazona
    David Gomez-Cabrero
    Alejandra Cervera
    Andrew McPherson
    Michał Wojciech Szcześniak
    Daniel J. Gaffney
    Laura L. Elo
    Xuegong Zhang
    Ali Mortazavi
    [J]. Genome Biology, 17
  • [46] Oqtans: a multifunctional workbench for RNA-seq data analysis
    Vipin T Sreedharan
    Sebastian J Schultheiss
    Géraldine Jean
    André Kahles
    Regina Bohnert
    Philipp Drewe
    Pramod Mudrakarta
    Nico Görnitz
    Georg Zeller
    Gunnar Rätsch
    [J]. BMC Bioinformatics, 15
  • [47] A comprehensive workflow for optimizing RNA-seq data analysis
    Jiang, Gao
    Zheng, Juan-Yu
    Ren, Shu-Ning
    Yin, Weilun
    Xia, Xinli
    Li, Yun
    Wang, Hou-Ling
    [J]. BMC GENOMICS, 2024, 25 (01):
  • [48] RNA-Seq analysis in MeV
    Howe, Eleanor A.
    Sinha, Raktim
    Schlauch, Daniel
    Quackenbush, John
    [J]. BIOINFORMATICS, 2011, 27 (22) : 3209 - 3210
  • [49] Advancing RNA-Seq analysis
    Haas, Brian J.
    Zody, Michael C.
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (05) : 421 - 423
  • [50] RNA-seq analysis for Dystrophinopathy
    Okubo, M.
    Noguchi, S.
    Hayashi, S.
    Komaki, H.
    Nishino, I.
    [J]. NEUROMUSCULAR DISORDERS, 2021, 31 : S84 - S84