Detecting and correcting systematic variation in large-scale RNA sequencing data

被引:121
|
作者
Li, Sheng [1 ,2 ]
Labaj, Pawel P. [3 ]
Zumbo, Paul [1 ,2 ]
Sykacek, Peter [3 ]
Shi, Wei [4 ]
Shi, Leming [5 ,6 ,7 ]
Phan, John [8 ]
Wu, Po-Yen [8 ]
Wang, May [8 ]
Wang, Charles [9 ,10 ]
Thierry-Mieg, Danielle [11 ]
Thierry-Mieg, Jean [11 ]
Kreil, David P. [3 ,12 ]
Mason, Christopher E. [1 ,2 ,13 ]
机构
[1] Weill Cornell Med Coll, Dept Physiol & Biophys, New York, NY 10065 USA
[2] Weill Cornell Med Coll, HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsau, New York, NY USA
[3] Boku Univ Vienna, Bioinformat Res Grp, Vienna, Austria
[4] WEHI, Dept Bioinformat, Melbourne, Vic, Australia
[5] Fudan Univ, State Key Lab Genet Engn, Sch Life Sci, Shanghai 200433, Peoples R China
[6] Fudan Univ, MOE Key Lab Contemporary Anthropol, Sch Life Sci, Shanghai 200433, Peoples R China
[7] Fudan Univ, Sch Pharm, Shanghai 200433, Peoples R China
[8] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[9] Loma Linda Univ, Ctr Genom, Loma Linda, CA 92350 USA
[10] Loma Linda Univ, Sch Med, Div Microbiol & Mol Genet, Loma Linda, CA USA
[11] Natl Ctr Biotechnol Informat, Bethesda, MD USA
[12] Univ Warwick, Coventry CV4 7AL, W Midlands, England
[13] Feil Family Brain & Mind Res Inst, New York, NY USA
基金
美国国家卫生研究院;
关键词
QUALITY-CONTROL; GENE-EXPRESSION; DIFFERENTIAL EXPRESSION; UNWANTED VARIATION; MESSENGER-RNA; SEQ; NORMALIZATION; TRANSCRIPTS; ALGORITHMS; PACKAGE;
D O I
10.1038/nbt.3000
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
High-throughput RNA sequencing (RNA-seq) enables comprehensive scans of entire transcriptomes, but best practices for analyzing RNA-seq data have not been fully defined, particularly for data collected with multiple sequencing platforms or at multiple sites. Here we used standardized RNA samples with built-in controls to examine sources of error in large-scale RNA-seq studies and their impact on the detection of differentially expressed genes (DEGs). Analysis of variations in guanine-cytosine content, gene coverage, sequencing error rate and insert size allowed identification of decreased reproducibility across sites. Moreover, commonly used methods for normalization (cqn, EDASeq, RUV2, sva, PEER) varied in their ability to remove these systematic biases, depending on sample complexity and initial data quality. Normalization methods that combine data from genes across sites are strongly recommended to identify and remove site-specific effects and can substantially improve RNA-seq studies.
引用
收藏
页码:888 / 895
页数:8
相关论文
共 50 条
  • [1] Detecting and correcting systematic variation in large-scale RNA sequencing data
    Sheng Li
    Paweł P Łabaj
    Paul Zumbo
    Peter Sykacek
    Wei Shi
    Leming Shi
    John Phan
    Po-Yen Wu
    May Wang
    Charles Wang
    Danielle Thierry-Mieg
    Jean Thierry-Mieg
    David P Kreil
    Christopher E Mason
    Nature Biotechnology, 2014, 32 : 888 - 895
  • [2] Removing unwanted variation from large-scale RNA sequencing data with PRPS
    Ramyar Molania
    Momeneh Foroutan
    Johann A. Gagnon-Bartsch
    Luke C. Gandolfo
    Aryan Jain
    Abhishek Sinha
    Gavriel Olshansky
    Alexander Dobrovic
    Anthony T. Papenfuss
    Terence P. Speed
    Nature Biotechnology, 2023, 41 : 82 - 95
  • [3] Removing unwanted variation from large-scale RNA sequencing data with PRPS
    Molania, Ramyar
    Foroutan, Momeneh
    Gagnon-Bartsch, Johann A.
    Gandolfo, Luke C.
    Jain, Aryan
    Sinha, Abhishek
    Olshansky, Gavriel
    Dobrovic, Alexander
    Papenfuss, Anthony T.
    Speed, Terence P.
    NATURE BIOTECHNOLOGY, 2023, 41 (01) : 82 - +
  • [4] Correcting scale distortion in RNA sequencing data
    Thron, Christopher
    Jafari, Farhad
    BMC BIOINFORMATICS, 2025, 26 (01):
  • [5] Detecting and correcting misclassified sequences in the large-scale public databases
    Bagheri, Hamid
    Severin, Andrew J.
    Rajan, Hridesh
    BIOINFORMATICS, 2020, 36 (18) : 4699 - 4705
  • [6] Detecting Sources of Transcriptional Heterogeneity in Large-Scale RNA-Seq Data Sets
    Searle, Brian C.
    Gittelman, Rachel M.
    Manor, Ohad
    Akey, Joshua M.
    GENETICS, 2016, 204 (04) : 1391 - +
  • [7] Understanding and detecting data fabrication in large-scale assessments
    Yamamoto, Kentaro
    Lennon, Mary Louise
    QUALITY ASSURANCE IN EDUCATION, 2018, 26 (02) : 196 - 212
  • [8] Detecting fabrication in large-scale molecular omics data
    Bradshaw, Michael S.
    Payne, Samuel H.
    PLOS ONE, 2021, 16 (11):
  • [9] MANAGEMENT OF THE DATA ASSOCIATED WITH LARGE-SCALE SEQUENCING AND MAPPING
    FICKETT, JW
    CINKOSKY, MJ
    BURKS, C
    GOAD, WB
    MISHRA, SK
    TUNG, CS
    BIOPHYSICAL JOURNAL, 1987, 51 (02) : A440 - A440
  • [10] Detecting and Correcting for Sample Contamination in DNA and RNA Sequencing Studies
    Boehnke, Michael
    GENETIC EPIDEMIOLOGY, 2016, 40 (07) : 609 - 609