SeqAssist: a novel toolkit for preliminary analysis of next-generation sequencing data

被引：8

作者：

Peng, Yan ^{[1
]}

Maxwell, Andrew S. ^{[1
]}

Barker, Natalie D. ^{[2
]}

Laird, Jennifer G. ^{[3
]}

Kennedy, Alan J. ^{[3
]}

Wang, Nan ^{[1
]}

Zhang, Chaoyang ^{[1
]}

Gong, Ping ^{[2
]}

机构：

[1] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA

[2] Badger Tech Serv LLC, San Antonio, TX 78216 USA

[3] US Army, Engn Res & Dev Ctr, Environm Lab, Vicksburg, MS 39180 USA

来源：

BMC BIOINFORMATICS | 2014年 / 15卷

基金：

美国国家科学基金会;

关键词：

RNA-SEQ; READ ALIGNMENT; GENOME;

D O I：

10.1186/1471-2105-15-S11-S10

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: While next-generation sequencing (NGS) technologies are rapidly advancing, an area that lags behind is the development of efficient and user-friendly tools for preliminary analysis of massive NGS data. As an effort to fill this gap to keep up with the fast pace of technological advancement and to accelerate data-to-results turnaround, we developed a novel software package named SeqAssist ("Sequencing Assistant" or SA). Results: SeqAssist takes NGS-generated FASTQ files as the input, employs the BWA-MEM aligner for sequence alignment, and aims to provide a quick overview and basic statistics of NGS data. It consists of three separate workflows: (1) the SA_RunStats workflow generates basic statistics about an NGS dataset, including numbers of raw, cleaned, redundant and unique reads, redundancy rate, and a list of unique sequences with length and read count; (2) the SA_Run2Ref workflow estimates the breadth, depth and evenness of genome-wide coverage of the NGS dataset at a nucleotide resolution; and (3) the SA_Run2Run workflow compares two NGS datasets to determine the redundancy (overlapping rate) between the two NGS runs. Statistics produced by SeqAssist or derived from SeqAssist output files are designed to inform the user: whether, what percentage, how many times and how evenly a genomic locus (i.e., gene, scaffold, chromosome or genome) is covered by sequencing reads, how redundant the sequencing reads are in a single run or between two runs. These statistics can guide the user in evaluating the quality of a DNA library prepared for RNA-Seq or genome (re-)sequencing and in deciding the number of sequencing runs required for the library. We have tested SeqAssist using a synthetic dataset and demonstrated its main features using multiple NGS datasets generated from genome re-sequencing experiments. Conclusions: SeqAssist is a useful and informative tool that can serve as a valuable "assistant" to a broad range of investigators who conduct genome re-sequencing, RNA-Seq, or de novo genome sequencing and assembly experiments.

引用

页数：11

共 50 条

[11] Extending KNIME for next-generation sequencing data analysis
Jagla, Bernd
Wiswedel, Bernd
Coppee, Jean-Yves
[J]. BIOINFORMATICS, 2011, 27 (20) : 2907 - 2909
[12] Next-generation sequencing data analysis on cloud computing
Kwon, Taesoo
Yoo, Won Gi
Lee, Won-Ja
Kim, Won
Kim, Dae-Won
[J]. GENES & GENOMICS, 2015, 37 (06) : 489 - 501
[13] Next-generation sequencing data analysis on cloud computing
Taesoo Kwon
Won Gi Yoo
Won-Ja Lee
Won Kim
Dae-Won Kim
[J]. Genes & Genomics, 2015, 37 : 489 - 501
[14] sRNAminer: A multifunctional toolkit for next-generation sequencing small RNA data mining in plants
Li, Guanliang
Chen, Chengjie
Chen, Peike
Meyers, Blake C.
Xia, Rui
[J]. SCIENCE BULLETIN, 2024, 69 (06) : 784 - 791
[15] Indexing Next-Generation Sequencing data
Jalili, Vahid
Matteucci, Matteo
Masseroli, Marco
Ceri, Stefano
[J]. INFORMATION SCIENCES, 2017, 384 : 90 - 109
[16] Summary of the Online Focus on next-generation sequencing data analysis
[J]. Nature Methods, 2009, 6 (11) : 802 - 803
[17] Analysis of error profiles in deep next-generation sequencing data
Ma, Xiaotu
Shao, Ying
Tian, Liqing
Flasch, Diane A.
Mulder, Heather L.
Edmonson, Michael N.
Liu, Yu
Chen, Xiang
Newman, Scott
Nakitandwe, Joy
Li, Yongjin
Li, Benshang
Shen, Shuhong
Wang, Zhaoming
Shurtleff, Sheila
Robison, Leslie L.
Levy, Shawn
Easton, John
Zhang, Jinghui
[J]. GENOME BIOLOGY, 2019, 20 (1)
[18] Discovery in cancer genomics by next-generation sequencing and data analysis
Mardis, Elaine
[J]. CANCER RESEARCH, 2011, 71
[19] Discriminant Analysis and Normalization Methods for Next-Generation Sequencing Data
Zhou, Yan
Wang, Junhui
Zhao, Yichuan
Tong, Tiejun
[J]. NEW FRONTIERS OF BIOSTATISTICS AND BIOINFORMATICS, 2018, : 365 - 384
[20] Analysis of error profiles in deep next-generation sequencing data
Ma, Xiaotu
Zhang, Jinghui
[J]. CANCER RESEARCH, 2019, 79 (13)

← 1 2 3 4 5 →