High-Throughput Compression of FASTQ Data with SeqDB

被引:23
|
作者
Howison, Mark [1 ]
机构
[1] Brown Univ, Ctr Computat & Visualizat, Providence, RI 02912 USA
关键词
Compression; data storage; next-generation sequencing; FASTQ; SEQUENCE; FORMAT;
D O I
10.1109/TCBB.2012.160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Compression has become a critical step in storing next-generation sequencing (NGS) data sets because of both the increasing size and decreasing costs of such data. Recent research into efficiently compressing sequence data has focused largely on improving compression ratios. Yet, the throughputs of current methods now lag far behind the I/O bandwidths of modern storage systems. As biologists move their analyses to high-performance systems with greater I/O bandwidth, low-throughput compression becomes a limiting factor. To address this gap, we present a new storage model called SeqDB, which offers high-throughput compression of sequence data with minimal sacrifice in compression ratio. It achieves this by combining the existing multithreaded Blosc compressor with a new data-parallel byte-packing scheme, called SeqPack, which interleaves sequence data and quality scores.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [1] High-Throughput, Lossless Data Compression on FPGAs
    Sukhwani, Bharat
    Abali, Bulent
    Brezzo, Bernard
    Asaad, Sameh
    [J]. 2011 IEEE 19TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2011, : 113 - 116
  • [2] Compression of Structured High-Throughput Sequencing Data
    Campagne, Fabien
    Dorff, Kevin C.
    Chambwe, Nyasha
    Robinson, James T.
    Mesirov, Jill P.
    [J]. PLOS ONE, 2013, 8 (11):
  • [3] High-throughput DNA sequence data compression
    Zhu, Zexuan
    Zhang, Yongpeng
    Ji, Zhen
    He, Shan
    Yang, Xiao
    [J]. BRIEFINGS IN BIOINFORMATICS, 2015, 16 (01) : 1 - 15
  • [4] Comparison of high-throughput sequencing data compression tools
    Numanagic, Ibrahim
    Bonfield, James K.
    Hach, Faraz
    Voges, Jan
    Ostermann, Joern
    Alberti, Claudio
    Mattavelli, Marco
    Sahinalp, S. Cenk
    [J]. NATURE METHODS, 2016, 13 (12) : 1005 - +
  • [5] A high-throughput VLSI architecture for LZFG data compression
    Chen, JM
    Wei, CH
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2002, E85D (03) : 497 - 509
  • [6] Comparison of high-throughput sequencing data compression tools
    Ibrahim Numanagić
    James K Bonfield
    Faraz Hach
    Jan Voges
    Jörn Ostermann
    Claudio Alberti
    Marco Mattavelli
    S Cenk Sahinalp
    [J]. Nature Methods, 2016, 13 : 1005 - 1008
  • [7] High-Throughput BitPacking Compression
    Lisa, Nusrat Jahan
    Nguyen, Tuan D. A.
    Habich, Dirk
    Kumar, Akash
    Lehner, Wolfgang
    [J]. 2019 22ND EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2019, : 643 - 646
  • [8] Data structures and compression algorithms for high-throughput sequencing technologies
    Kenny Daily
    Paul Rigor
    Scott Christley
    Xiaohui Xie
    Pierre Baldi
    [J]. BMC Bioinformatics, 11
  • [9] Data structures and compression algorithms for high-throughput sequencing technologies
    Daily, Kenny
    Rigor, Paul
    Christley, Scott
    Xie, Xiaohui
    Baldi, Pierre
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [10] ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression
    Schendel, Eric R.
    Jin, Ye
    Shah, Neil
    Chen, Jackie
    Chang, C. S.
    Ku, Seung-Hoe
    Ethier, Stephane
    Klasky, Scott
    Latham, Robert
    Ross, Robert
    Samatova, Nagiza F.
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 138 - 149