SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data

被引:8
|
作者
Peng, Yan [1 ]
Maxwell, Andrew S. [1 ]
Barker, Natalie D. [2 ]
Laird, Jennifer G. [3 ]
Kennedy, Alan J. [3 ]
Wang, Nan [1 ]
Zhang, Chaoyang [1 ]
Gong, Ping [2 ]
机构
[1] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA
[2] Badger Tech Serv LLC, San Antonio, TX 78216 USA
[3] US Army, Engn Res & Dev Ctr, Environm Lab, Vicksburg, MS 39180 USA
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
美国国家科学基金会;
关键词
RNA-SEQ; READ ALIGNMENT; GENOME;
D O I
10.1186/1471-2105-15-S11-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: While next-generation sequencing (NGS) technologies are rapidly advancing, an area that lags behind is the development of efficient and user-friendly tools for preliminary analysis of massive NGS data. As an effort to fill this gap to keep up with the fast pace of technological advancement and to accelerate data-to-results turnaround, we developed a novel software package named SeqAssist ("Sequencing Assistant" or SA). Results: SeqAssist takes NGS-generated FASTQ files as the input, employs the BWA-MEM aligner for sequence alignment, and aims to provide a quick overview and basic statistics of NGS data. It consists of three separate workflows: (1) the SA_RunStats workflow generates basic statistics about an NGS dataset, including numbers of raw, cleaned, redundant and unique reads, redundancy rate, and a list of unique sequences with length and read count; (2) the SA_Run2Ref workflow estimates the breadth, depth and evenness of genome-wide coverage of the NGS dataset at a nucleotide resolution; and (3) the SA_Run2Run workflow compares two NGS datasets to determine the redundancy (overlapping rate) between the two NGS runs. Statistics produced by SeqAssist or derived from SeqAssist output files are designed to inform the user: whether, what percentage, how many times and how evenly a genomic locus (i.e., gene, scaffold, chromosome or genome) is covered by sequencing reads, how redundant the sequencing reads are in a single run or between two runs. These statistics can guide the user in evaluating the quality of a DNA library prepared for RNA-Seq or genome (re-)sequencing and in deciding the number of sequencing runs required for the library. We have tested SeqAssist using a synthetic dataset and demonstrated its main features using multiple NGS datasets generated from genome re-sequencing experiments. Conclusions: SeqAssist is a useful and informative tool that can serve as a valuable "assistant" to a broad range of investigators who conduct genome re-sequencing, RNA-Seq, or de novo genome sequencing and assembly experiments.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data
    Yan Peng
    Andrew S Maxwell
    Natalie D Barker
    Jennifer G Laird
    Alan J Kennedy
    Nan Wang
    Chaoyang Zhang
    Ping Gong
    [J]. BMC Bioinformatics, 15
  • [2] The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
    McKenna, Aaron
    Hanna, Matthew
    Banks, Eric
    Sivachenko, Andrey
    Cibulskis, Kristian
    Kernytsky, Andrew
    Garimella, Kiran
    Altshuler, David
    Gabriel, Stacey
    Daly, Mark
    DePristo, Mark A.
    [J]. GENOME RESEARCH, 2010, 20 (09) : 1297 - 1303
  • [3] Pathway analysis with next-generation sequencing data
    Jinying Zhao
    Yun Zhu
    Eric Boerwinkle
    Momiao Xiong
    [J]. European Journal of Human Genetics, 2015, 23 : 507 - 515
  • [4] Focus on next-generation sequencing data analysis
    Rusk N.
    [J]. Nature Methods, 2009, 6 (Suppl 11) : S1 - S1
  • [5] Applications and data analysis of next-generation sequencing
    Vogl, Ina
    Benet-Pages, Anna
    Eck, Sebastian H.
    Kuhn, Marius
    Vosberg, Sebastian
    Greif, Philipp A.
    Metzeler, Klaus H.
    Biskup, Saskia
    Mueller-Reible, Clemens
    Klein, Hanns-Georg
    [J]. LABORATORIUMSMEDIZIN-JOURNAL OF LABORATORY MEDICINE, 2013, 37 (06): : 305 - 315
  • [6] Pathway analysis with next-generation sequencing data
    Zhao, Jinying
    Zhu, Yun
    Boerwinkle, Eric
    Xiong, Momiao
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2015, 23 (04) : 507 - 515
  • [7] Novel Computational Technologies for Next-Generation Sequencing Data Analysis and Their Applications
    Tang, Chuan Yi
    Hung, Che-Lun
    Zheng, Huiru
    Lin, Chun-Yuan
    Jiang, Hai
    [J]. INTERNATIONAL JOURNAL OF GENOMICS, 2015, 2015
  • [8] A Bioinformatics Toolkit for Next-Generation Sequencing in Clinical Oncology
    Cabello-Aguilar, Simon
    Vendrell, Julie A.
    Solassol, Jerome
    [J]. CURRENT ISSUES IN MOLECULAR BIOLOGY, 2023, 45 (12) : 9737 - 9752
  • [9] PriVar: a toolkit for prioritizing SNVs and indels from next-generation sequencing data
    Zhang, Lu
    Zhang, Jing
    Yang, Jing
    Ying, Dingge
    Lau, Yu Lung
    Yang, Wanling
    [J]. BIOINFORMATICS, 2013, 29 (01) : 124 - 125
  • [10] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    [J]. BIOINFORMATICS, 2023, 39 (01)