Novel Data Transformations for RNA-seq Differential Expression Analysis

被引:16
|
作者
Zhang, Zeyu [1 ]
Yu, Danyang [2 ]
Seo, Minseok [3 ]
Hersh, Craig P. [3 ]
Weiss, Scott T. [3 ]
Qiu, Weiliang [3 ]
机构
[1] Tongji Univ, Sch Life Sci & Technol, Dept Bioinformat, Shanghai, Peoples R China
[2] Hunan Univ, Coll Math & Econometr, Dept Informat & Comp Sci, Changsha, Hunan, Peoples R China
[3] Harvard Med Sch, Brigham & Womens Hosp, Channing Div Network Med, Boston, MA 02115 USA
关键词
REPRODUCIBILITY;
D O I
10.1038/s41598-019-41315-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We propose eight data transformations (r, r2, rv, rv2, l, l2, lv, and lv2) for RNA-seq data analysis aiming to make the transformed sample mean to be representative of the distribution center since it is not always possible to transform count data to satisfy the normality assumption. Simulation studies showed that for data sets with small (e.g., nCases = nControls = 3) or large sample size (e.g., nCases = nControls = 100) limma based on data from the l, l2, and r2 transformations performed better than limma based on data from the voom transformation in term of accuracy, FDR, and FNR. For datasets with moderate sample size (e.g., nCases = nControls = 30 or 50), limma with the rv and rv2 transformations performed similarly to limma with the voom transformation. Real data analysis results are consistent with simulation analysis results: limma with the r, l, r2, and l2 transformation performed better than limma with the voom transformation when sample sizes are small or large; limma with the rv and rv2 transformations performed similarly to limma with the voom transformation when sample sizes are moderate. We also observed from our data analyses that for datasets with large sample size, the gene-selection via the Wilcoxon rank sum test (a non-parametric two sample test method) based on the raw data outperformed limma based on the transformed data.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Novel Data Transformations for RNA-seq Differential Expression Analysis
    Zeyu Zhang
    Danyang Yu
    Minseok Seo
    Craig P. Hersh
    Scott T. Weiss
    Weiliang Qiu
    [J]. Scientific Reports, 9
  • [2] Differential expression analysis for paired RNA-seq data
    Chung, Lisa M.
    Ferguson, John P.
    Zheng, Wei
    Qian, Feng
    Bruno, Vincent
    Montgomery, Ruth R.
    Zhao, Hongyu
    [J]. BMC BIOINFORMATICS, 2013, 14 : 110
  • [3] Differential expression analysis for paired RNA-seq data
    Lisa M Chung
    John P Ferguson
    Wei Zheng
    Feng Qian
    Vincent Bruno
    Ruth R Montgomery
    Hongyu Zhao
    [J]. BMC Bioinformatics, 14
  • [4] Stability of methods for differential expression analysis of RNA-seq data
    Bingqing Lin
    Zhen Pang
    [J]. BMC Genomics, 20
  • [5] Stability of methods for differential expression analysis of RNA-seq data
    Lin, Bingqing
    Pang, Zhen
    [J]. BMC GENOMICS, 2019, 20 (1)
  • [6] A comparison of methods for differential expression analysis of RNA-seq data
    Soneson, Charlotte
    Delorenzi, Mauro
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [7] A comparison of methods for differential expression analysis of RNA-seq data
    Charlotte Soneson
    Mauro Delorenzi
    [J]. BMC Bioinformatics, 14
  • [8] A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data
    Liu, Kefei
    Shen, Li
    Jiang, Hui
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 437 - 442
  • [9] Data Driven Feature Selection for RNA-Seq Differential Expression Analysis
    Han, Henry
    [J]. PATTERN RECOGNITION IN BIOINFORMATICS, PRIB 2014, 2014, 8626 : 114 - 115
  • [10] A scaling normalization method for differential expression analysis of RNA-seq data
    Robinson, Mark D.
    Oshlack, Alicia
    [J]. GENOME BIOLOGY, 2010, 11 (03):