RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data

被引:38
|
作者
Zhou, Qian [1 ,2 ]
Su, Xiaoquan [3 ,4 ,5 ]
Jing, Gongchao [3 ,4 ]
Chen, Songlin [1 ,2 ]
Ning, Kang [6 ]
机构
[1] Chinese Acad Fishery Sci, Key Lab Sustainable Dev Marine Fisheries, Minist Agr, Yellow Sea Fisheries Res Inst, Qingdao 266071, Shandong, Peoples R China
[2] Qingdao Natl Lab Marine Sci & Technol, Lab Marine Fisheries Sci & Food Prod Proc, Qingdao 266071, Shandong, Peoples R China
[3] Chinese Acad Sci, Qingdao Inst Bioenergy & Bioproc Technol, CAS Key Lab Biofuels, Shandong Key Lab Energy Genet, Qingdao 266101, Shandong, Peoples R China
[4] Chinese Acad Sci, Qingdao Inst Bioenergy & Bioproc Technol, Single Cell Ctr, Qingdao 266101, Shandong, Peoples R China
[5] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[6] Huazhong Univ Sci & Technol, Coll Life Sci & Technol, Hubei Key Lab Bioinformat & Mol Imaging, Minist Educ,Key Lab Mol Biophys,Dept Bioinformat, Wuhan 430074, Hubei, Peoples R China
来源
BMC GENOMICS | 2018年 / 19卷
基金
中国国家自然科学基金;
关键词
Quality control; RNA-Seq; Contamination identification; Alignment statistics; Parallel computing;
D O I
10.1186/s12864-018-4503-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, the raw data must be subjected to vigorous quality control (QC) procedures before downstream analysis. Currently, an accurate and complete QC of RNA-Seq data requires of a suite of different QC tools used consecutively, which is inefficient in terms of usability, running time, file usage, and interpretability of the results. Results: We developed a comprehensive, fast and easy-to-use QC pipeline for RNA-Seq data, RNA-QC-Chain, which involves three steps: (1) sequencing-quality assessment and trimming; (2) internal (ribosomal RNAs) and external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, alignment coverage, sequencing depth and pair-end read mapping information). This package was developed based on our previously reported tool for general QC of next-generation sequencing (NGS) data called QC-Chain, with extensions specifically designed for RNA-Seq data. It has several features that are not available yet in other QC tools for RNA-Seq data, such as RNA sequence trimming, automatic rRNA detection and automatic contaminating species identification. The three QC steps can run either sequentially or independently, enabling RNA-QC-Chain as a comprehensive package with high flexibility and usability. Moreover, parallel computing and optimizations are embedded in most of the QC procedures, providing a superior efficiency. The performance of RNA-QC-Chain has been evaluated with different types of datasets, including an in-house sequencing data, a semi-simulated data, and two real datasets downloaded from public database. Comparisons of RNA-QC-Chain with other QC tools have manifested its superiorities in both function versatility and processing speed. Conclusions: We present here a tool, RNA-QC-Chain, which can be used to comprehensively resolve the quality control processes of RNA-Seq data effectively and efficiently.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] An automated quality control pipeline for eQTL analysis with RNA-seq data
    Wang, Tao
    Ruan, Junpeng
    Yin, Quanwei
    Dong, Xianjun
    Wang, Yadong
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1780 - 1786
  • [12] Quality Control for RNA-Seq (QuaCRS): An Integrated Quality Control Pipeline
    Kroll, Karl W.
    Mokaram, Nima E.
    Pelletier, Alexander R.
    Frankhouser, David E.
    Westphal, Maximillian S.
    Stump, Paige A.
    Stump, Cameron L.
    Bundschuh, Ralf
    Blachly, James S.
    Yan, Pearlly
    CANCER INFORMATICS, 2014, 13 : 7 - 14
  • [13] A pipeline for RNA-seq data processing and quality assessment
    Goncalves, Angela
    Tikhonov, Andrew
    Brazma, Alvis
    Kapushesky, Misha
    BIOINFORMATICS, 2011, 27 (06) : 867 - 869
  • [14] COMPSRA: a COMprehensive Platform for Small RNA-Seq data Analysis
    Jiang Li
    Alvin T. Kho
    Robert P. Chase
    Lorena Pantano
    Leanna Farnam
    Sami S. Amr
    Kelan G. Tantisira
    Scientific Reports, 10
  • [15] Bubble: a fast single-cell RNA-seq imputation using an autoencoder constrained by bulk RNA-seq data
    Chen, Siqi
    Yan, Xuhua
    Zheng, Ruiqing
    Li, Min
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
  • [16] iSmaRT: a toolkit for a comprehensive analysis of small RNA-Seq data
    Panero, Riccardo
    Rinaldi, Antonio
    Memoli, Domenico
    Nassa, Giovanni
    Ravo, Maria
    Rizzo, Francesca
    Tarallo, Roberta
    Milanesi, Luciano
    Weisz, Alessandro
    Giurato, Giorgio
    BIOINFORMATICS, 2017, 33 (06) : 938 - 940
  • [17] COMPSRA: a COMprehensive Platform for Small RNA-Seq data Analysis
    Li, Jiang
    Kho, Alvin T.
    Chase, Robert P.
    Pantano, Lorena
    Farnam, Leanna
    Amr, Sami S.
    Tantisira, Kelan G.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [18] Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome
    Peng, Zhiyu
    Cheng, Yanbing
    Tan, Bertrand Chin-Ming
    Kang, Lin
    Tian, Zhijian
    Zhu, Yuankun
    Zhang, Wenwei
    Liang, Yu
    Hu, Xueda
    Tan, Xuemei
    Guo, Jing
    Dong, Zirui
    Liang, Yan
    Bao, Li
    Wang, Jun
    NATURE BIOTECHNOLOGY, 2012, 30 (03) : 253 - +
  • [19] Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome
    Zhiyu Peng
    Yanbing Cheng
    Bertrand Chin-Ming Tan
    Lin Kang
    Zhijian Tian
    Yuankun Zhu
    Wenwei Zhang
    Yu Liang
    Xueda Hu
    Xuemei Tan
    Jing Guo
    Zirui Dong
    Yan Liang
    Li Bao
    Jun Wang
    Nature Biotechnology, 2012, 30 : 253 - 260
  • [20] RNA-SeQC: RNA-seq metrics for quality control and process optimization
    DeLuca, David S.
    Levin, Joshua Z.
    Sivachenko, Andrey
    Fennell, Timothy
    Nazaire, Marc-Danie
    Williams, Chris
    Reich, Michael
    Winckler, Wendy
    Getz, Gad
    BIOINFORMATICS, 2012, 28 (11) : 1530 - 1532