NBLDA: negative binomial linear discriminant analysis for RNA-Seq data

被引:27
|
作者
Dong, Kai [1 ]
Zhao, Hongyu [2 ]
Tong, Tiejun [1 ]
Wan, Xiang [3 ,4 ]
机构
[1] Hong Kong Baptist Univ, Dept Math, Kowloon Tong, Hong Kong, Peoples R China
[2] Yale Univ, Dept Biostat, New Haven, CT 06510 USA
[3] Hong Kong Baptist Univ, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
[4] Hong Kong Baptist Univ, Inst Computat & Theoret Studies, Kowloon Tong, Hong Kong, Peoples R China
来源
BMC BIOINFORMATICS | 2016年 / 17卷
基金
美国国家卫生研究院; 中国国家自然科学基金;
关键词
RNA-Seq; Negative binomial distribution; Linear discriminant analysis; IDENTIFYING DIFFERENTIAL EXPRESSION; CLASSIFICATION; DISPERSION; TUMORS;
D O I
10.1186/s12859-016-1208-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5: 2493-2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. Results: In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications. Conclusions: We have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/similar to xwan/NBLDA. R or https://github.com/yangchadam/NBLDA
引用
收藏
页数:10
相关论文
共 50 条
  • [1] NBLDA: negative binomial linear discriminant analysis for RNA-Seq data
    Kai Dong
    Hongyu Zhao
    Tiejun Tong
    Xiang Wan
    [J]. BMC Bioinformatics, 17
  • [2] Negative binomial additive model for RNA-Seq data analysis
    Xu Ren
    Pei-Fen Kuan
    [J]. BMC Bioinformatics, 21
  • [3] Negative binomial additive model for RNA-Seq data analysis
    Ren Xu
    Kuan Pei-Fen
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [4] Bayesian Analysis of RNA-Seq Data Using a Family of Negative Binomial Models
    Zhao, Lili
    Wu, Weisheng
    Feng, Dai
    Jiang, Hui
    Nguyen, XuanLong
    [J]. BAYESIAN ANALYSIS, 2018, 13 (02): : 411 - 436
  • [5] A SPARSE NEGATIVE BINOMIAL CLASSIFIER WITH COVARIATE ADJUSTMENT FOR RNA-SEQ DATA
    Rahman, Tanbin
    Huang, Hsin-En
    Li, Yujia
    Tai, An-Shun
    Hseih, Wen-Ping
    McClung, Colleen A.
    Tseng, George
    [J]. ANNALS OF APPLIED STATISTICS, 2022, 16 (02): : 1071 - 1089
  • [6] Marginal likelihood estimation of negative binomial parameters with applications to RNA-seq data
    Leon-Novelo, Luis
    Fuentes, Claudio
    Emerson, Sarah
    [J]. BIOSTATISTICS, 2017, 18 (04) : 637 - 650
  • [7] A sparse negative binomial mixture model for clustering RNA-seq count data
    Li, Yujia
    Rahman, Tanbin
    Ma, Tianzhou
    Tang, Lu
    Tseng, George C.
    [J]. BIOSTATISTICS, 2022, 24 (01) : 68 - 84
  • [8] Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model
    Li, Xiaohong
    Wu, Dongfeng
    Cooper, Nigel G. F.
    Rai, Shesh N.
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2019, 18 (01)
  • [9] Data-based RNA-seq simulations by binomial thinning
    Gerard, David
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [10] Data-based RNA-seq simulations by binomial thinning
    David Gerard
    [J]. BMC Bioinformatics, 21