ReSeq simulates realistic Illumina high-throughput sequencing data

被引:10
|
作者
Schmeing, Stephan [1 ,2 ]
Robinson, Mark D. [1 ,2 ]
机构
[1] Univ Zurich, Inst Mol Life Sci, Winterthurerstr 190, CH-8057 Zurich, Switzerland
[2] SIB Swiss Inst Bioinformat, Winterthurerstr 190, CH-8057 Zurich, Switzerland
关键词
Simulation; Genomic; High-throughput sequencing; Illumina; ERROR PROFILES; RNA-SEQ; BIAS; QUALITY; BENCHMARKING; DISCOVERY; RESOURCE; GENOMES; SNP;
D O I
10.1186/s13059-021-02265-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq.
引用
收藏
页数:37
相关论文
共 50 条
  • [41] High-throughput protein sequencing
    Pham, V
    Tropea, J
    Wong, S
    Quach, J
    Henzel, WJ
    ANALYTICAL CHEMISTRY, 2003, 75 (04) : 875 - 882
  • [42] High-Throughput Sequencing and Metagenomics
    William J. Jones
    Estuaries and Coasts, 2010, 33 : 944 - 952
  • [43] High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq
    King, Jonathan L.
    Larue, Bobby L.
    Novroski, Nicole M.
    Stoljarova, Monika
    Seo, Seung Bum
    Zeng, Xiangpei
    Warshauer, David H.
    Davis, Carey P.
    Parson, Walther
    Sajantila, Antti
    Budowle, Bruce
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2014, 12 : 128 - 135
  • [44] fluff: exploratory analysis and visualization of high-throughput sequencing data
    Georgiou, Georgios
    van Heeringen, Simon J.
    PEERJ, 2016, 4
  • [45] Prevention, diagnosis and treatment of high-throughput sequencing data pathologies
    Zhou, Xiaofan
    Rokas, Antonis
    MOLECULAR ECOLOGY, 2014, 23 (07) : 1679 - 1700
  • [46] AlmostSignificant: simplifying quality control of high-throughput sequencing data
    Ward, Joseph
    Cole, Christian
    Febrer, Melanie
    Barton, Geoffrey J.
    BIOINFORMATICS, 2016, 32 (24) : 3850 - 3851
  • [47] HiTEC: accurate error correction in high-throughput sequencing data
    Ilie, Lucian
    Fazayeli, Farideh
    Ilie, Silvana
    BIOINFORMATICS, 2011, 27 (03) : 295 - 302
  • [48] Data structures and compression algorithms for high-throughput sequencing technologies
    Kenny Daily
    Paul Rigor
    Scott Christley
    Xiaohui Xie
    Pierre Baldi
    BMC Bioinformatics, 11
  • [49] High-throughput DNA sequencing: A genomic data manufacturing process
    Huang, GM
    DNA SEQUENCE, 1999, 10 (03): : 149 - 153
  • [50] Model based heritability scores for high-throughput sequencing data
    Pratyaydipta Rudra
    W. Jenny Shi
    Brian Vestal
    Pamela H. Russell
    Aaron Odell
    Robin D. Dowell
    Richard A. Radcliffe
    Laura M. Saba
    Katerina Kechris
    BMC Bioinformatics, 18