Bias detection and correction in RNA-Sequencing data

被引:108
|
作者
Zheng, Wei [1 ]
Chung, Lisa M. [2 ]
Zhao, Hongyu [1 ,2 ]
机构
[1] Yale Univ, Keck Lab, New Haven, CT 06510 USA
[2] Yale Univ, Sch Publ Hlth, Div Biostat, New Haven, CT 06510 USA
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
美国国家卫生研究院;
关键词
GENE-EXPRESSION; DIFFERENTIAL EXPRESSION; SEQ; NORMALIZATION; TRANSCRIPTOME; RESOLUTION;
D O I
10.1186/1471-2105-12-290
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with multiple isoforms, expression of each isoform may be estimated from RNA-Seq data. Despite these advantages, recent work revealed that base level read counts from RNA-Seq data may not be randomly distributed and can be affected by local nucleotide composition. It was not clear though how the base level read count bias may affect gene level expression estimates. Results: In this paper, by using five published RNA-Seq data sets from different biological sources and with different data preprocessing schemes, we showed that commonly used estimates of gene expression levels from RNA-Seq data, such as reads per kilobase of gene length per million reads (RPKM), are biased in terms of gene length, GC content and dinucleotide frequencies. We directly examined the biases at the gene-level, and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously. Compared to previously proposed base level correction methods, our method reduces bias in gene-level expression estimates more effectively. Conclusions: Our method identifies and corrects different sources of biases in gene-level expression measures from RNA-Seq data, and provides more accurate estimates of gene expression levels from RNA-Seq. This method should prove useful in meta-analysis of gene expression levels using different platforms or experimental protocols.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Bias detection and correction in RNA-Sequencing data
    Wei Zheng
    Lisa M Chung
    Hongyu Zhao
    [J]. BMC Bioinformatics, 12
  • [2] Nonparametric clustering of RNA-sequencing data
    Lozano, Gabriel
    Atallah, Nadia
    Levine, Michael
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (06) : 547 - 559
  • [3] Bias-invariant RNA-sequencing metadata annotation
    Wartmann, Hannes
    Heins, Sven
    Kloiber, Karin
    Bonn, Stefan
    [J]. GIGASCIENCE, 2021, 10 (09):
  • [4] Poly(a) selection introduces bias and undue noise in direct RNA-sequencing
    Marcus J. Viscardi
    Joshua A. Arribere
    [J]. BMC Genomics, 23
  • [5] Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies
    Nikolaos I Panousis
    Maria Gutierrez-Arcelus
    Emmanouil T Dermitzakis
    Tuuli Lappalainen
    [J]. Genome Biology, 15
  • [6] Poly(a) selection introduces bias and undue noise in direct RNA-sequencing
    Viscardi, Marcus J.
    Arribere, Joshua A.
    [J]. BMC GENOMICS, 2022, 23 (01)
  • [7] Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies
    Panousis, Nikolaos I.
    Gutierrez-Arcelus, Maria
    Dermitzakis, Emmanouil T.
    Lappalainen, Tuuli
    [J]. GENOME BIOLOGY, 2014, 15 (09): : 467
  • [8] RNA-sequencing in toxicogenomics
    Kleinjans, J.
    [J]. TOXICOLOGY LETTERS, 2015, 238 (02) : S35 - S36
  • [9] MLSeq: Machine learning interface for RNA-sequencing data
    Goksuluk, Dincer
    Zararsiz, Gokmen
    Korkmaz, Selcuk
    Eldem, Vahap
    Zararsiz, Gozde Erturk
    Ozcetin, Erdener
    Ozturk, Ahmet
    Karaagaoglu, Ahmet Ergun
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 175 : 223 - 231
  • [10] Combining bulk RNA-sequencing and single-cell RNA-sequencing data to reveal the immune microenvironment and metabolic pattern of osteosarcoma
    Huang, Ruichao
    Wang, Xiaohu
    Yin, Xiangyun
    Zhou, Yaqi
    Sun, Jiansheng
    Yin, Zhongxiu
    Zhu, Zhi
    [J]. FRONTIERS IN GENETICS, 2022, 13