RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data

被引:38
|
作者
Zhou, Qian [1 ,2 ]
Su, Xiaoquan [3 ,4 ,5 ]
Jing, Gongchao [3 ,4 ]
Chen, Songlin [1 ,2 ]
Ning, Kang [6 ]
机构
[1] Chinese Acad Fishery Sci, Key Lab Sustainable Dev Marine Fisheries, Minist Agr, Yellow Sea Fisheries Res Inst, Qingdao 266071, Shandong, Peoples R China
[2] Qingdao Natl Lab Marine Sci & Technol, Lab Marine Fisheries Sci & Food Prod Proc, Qingdao 266071, Shandong, Peoples R China
[3] Chinese Acad Sci, Qingdao Inst Bioenergy & Bioproc Technol, CAS Key Lab Biofuels, Shandong Key Lab Energy Genet, Qingdao 266101, Shandong, Peoples R China
[4] Chinese Acad Sci, Qingdao Inst Bioenergy & Bioproc Technol, Single Cell Ctr, Qingdao 266101, Shandong, Peoples R China
[5] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[6] Huazhong Univ Sci & Technol, Coll Life Sci & Technol, Hubei Key Lab Bioinformat & Mol Imaging, Minist Educ,Key Lab Mol Biophys,Dept Bioinformat, Wuhan 430074, Hubei, Peoples R China
来源
BMC GENOMICS | 2018年 / 19卷
基金
中国国家自然科学基金;
关键词
Quality control; RNA-Seq; Contamination identification; Alignment statistics; Parallel computing;
D O I
10.1186/s12864-018-4503-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, the raw data must be subjected to vigorous quality control (QC) procedures before downstream analysis. Currently, an accurate and complete QC of RNA-Seq data requires of a suite of different QC tools used consecutively, which is inefficient in terms of usability, running time, file usage, and interpretability of the results. Results: We developed a comprehensive, fast and easy-to-use QC pipeline for RNA-Seq data, RNA-QC-Chain, which involves three steps: (1) sequencing-quality assessment and trimming; (2) internal (ribosomal RNAs) and external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, alignment coverage, sequencing depth and pair-end read mapping information). This package was developed based on our previously reported tool for general QC of next-generation sequencing (NGS) data called QC-Chain, with extensions specifically designed for RNA-Seq data. It has several features that are not available yet in other QC tools for RNA-Seq data, such as RNA sequence trimming, automatic rRNA detection and automatic contaminating species identification. The three QC steps can run either sequentially or independently, enabling RNA-QC-Chain as a comprehensive package with high flexibility and usability. Moreover, parallel computing and optimizations are embedded in most of the QC procedures, providing a superior efficiency. The performance of RNA-QC-Chain has been evaluated with different types of datasets, including an in-house sequencing data, a semi-simulated data, and two real datasets downloaded from public database. Comparisons of RNA-QC-Chain with other QC tools have manifested its superiorities in both function versatility and processing speed. Conclusions: We present here a tool, RNA-QC-Chain, which can be used to comprehensively resolve the quality control processes of RNA-Seq data effectively and efficiently.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] RNA editing in the human ENCODE RNA-seq data
    Park, Eddie
    Williams, Brian
    Wold, Barbara J.
    Mortazavi, Ali
    GENOME RESEARCH, 2012, 22 (09) : 1626 - 1633
  • [22] The impact of quality filter for RNA-Seq
    de Sa, Pablo H. C. G.
    Veras, Adonney A. O.
    Carneiro, Adriana R.
    Pinheiro, Kenny C.
    Pinto, Anne C.
    Soares, Siomar C.
    Schneider, Maria P. C.
    Azevedo, Vasco
    Silva, Artur
    Ramos, Rommel T. J.
    GENE, 2015, 563 (02) : 165 - 171
  • [24] grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis
    Rummel, Teresa
    Sakellaridi, Lygeri
    Erhard, Florian
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [25] A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium
    Su, Zhenqiang
    Labaj, Pawel P.
    Li, Sheng
    Thierry-Mieg, Jean
    Thierry-Mieg, Danielle
    Shi, Wei
    Wang, Charles
    Schroth, Gary P.
    Setterquist, Robert A.
    Thompson, John F.
    Jones, Wendell D.
    Xiao, Wenzhong
    Xu, Weihong
    Jensen, Roderick V.
    Kelly, Reagan
    Xu, Joshua
    Conesa, Ana
    Furlanello, Cesare
    Gao, Hanlin
    Hong, Huixiao
    Jafari, Nadereh
    Letovsky, Stan
    Liao, Yang
    Lu, Fei
    Oakeley, Edward J.
    Peng, Zhiyu
    Praul, Craig A.
    Santoyo-Lopez, Javier
    Scherer, Andreas
    Shi, Tieliu
    Smyth, Gordon K.
    Staedtler, Frank
    Sykacek, Peter
    Tan, Xin-Xing
    Thompson, E. Aubrey
    Vandesompele, Jo
    Wang, May D.
    Wang, Jian
    Wolfinger, Russell D.
    Zavadil, Jiri
    Auerbach, Scott S.
    Bao, Wenjun
    Binder, Hans
    Blomquist, Thomas
    Brilliant, Murray H.
    Bushel, Pierre R.
    Cain, Weimin
    Catalano, Jennifer G.
    Chang, Ching-Wei
    Chen, Tao
    NATURE BIOTECHNOLOGY, 2014, 32 (09) : 903 - 914
  • [26] Dimensionality Reduction of RNA-Seq Data
    Al-Turaiki, Isra
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (03): : 31 - 36
  • [27] grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis
    Teresa Rummel
    Lygeri Sakellaridi
    Florian Erhard
    Nature Communications, 14
  • [28] scRNABatchQC: multi-samples quality control for single cell RNA-seq data
    Liu, Qi
    Sheng, Quanhu
    Ping, Jie
    Ramirez, Marisol Adelina
    Lau, Ken S.
    Coffey, Robert J.
    Shyr, Yu
    BIOINFORMATICS, 2019, 35 (24) : 5306 - 5308
  • [29] OneStopRNAseq: A Web Application for Comprehensive and Efficient Analyses of RNA-Seq Data
    Li, Rui
    Hu, Kai
    Liu, Haibo
    Green, Michael R.
    Zhu, Lihua Julie
    GENES, 2020, 11 (10) : 1 - 14
  • [30] Quality control of single-cell RNA-seq by SinQC
    Jiang, Peng
    Thomson, James A.
    Stewart, Ron
    BIOINFORMATICS, 2016, 32 (16) : 2514 - 2516