High-Throughput Compression of FASTQ Data with SeqDB

被引:23
|
作者
Howison, Mark [1 ]
机构
[1] Brown Univ, Ctr Computat & Visualizat, Providence, RI 02912 USA
关键词
Compression; data storage; next-generation sequencing; FASTQ; SEQUENCE; FORMAT;
D O I
10.1109/TCBB.2012.160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Compression has become a critical step in storing next-generation sequencing (NGS) data sets because of both the increasing size and decreasing costs of such data. Recent research into efficiently compressing sequence data has focused largely on improving compression ratios. Yet, the throughputs of current methods now lag far behind the I/O bandwidths of modern storage systems. As biologists move their analyses to high-performance systems with greater I/O bandwidth, low-throughput compression becomes a limiting factor. To address this gap, we present a new storage model called SeqDB, which offers high-throughput compression of sequence data with minimal sacrifice in compression ratio. It achieves this by combining the existing multithreaded Blosc compressor with a new data-parallel byte-packing scheme, called SeqPack, which interleaves sequence data and quality scores.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [21] High-throughput data analysis.
    Rogers, D
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 224 : U510 - U510
  • [22] Funding high-throughput data sharing
    Ball, CA
    Sherlock, G
    Brazma, A
    [J]. NATURE BIOTECHNOLOGY, 2004, 22 (09) : 1179 - 1183
  • [23] High-throughput technologies for gathering data
    Fortina, P.
    [J]. CLINICA CHIMICA ACTA, 2019, 493 : S755 - S755
  • [24] Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data
    Qiao, Dandi
    Yip, Wai-Ki
    Lange, Christoph
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [25] High-fidelity compression for high-throughput photoacoustic microscopy systems
    Zafar, Mohsin
    Manwar, Rayyan
    Avanaki, Kamran
    [J]. JOURNAL OF BIOPHOTONICS, 2022, 15 (05)
  • [26] High-throughput Biological Cell Classification Featuring Real-time Optical Data Compression
    Jalali, Bahram
    Mahjoubfar, Ata
    Chen, Claire L.
    [J]. 2015 49th Annual Conference on Information Sciences and Systems (CISS), 2015,
  • [27] Low-Latency Lossless Compression Codec Design for High-Throughput Data-Buses
    Katsu, Yuki
    Kaneko, Haruhiko
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2016, : 269 - 270
  • [28] Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data
    Dandi Qiao
    Wai-Ki Yip
    Christoph Lange
    [J]. BMC Bioinformatics, 13
  • [29] High-throughput data analysis with SARNavigator.
    Young, DC
    Reiling, S
    Burkett, S
    Soltanshahi, F
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2003, 225 : U746 - U746
  • [30] Genome reassembly with high-throughput sequencing data
    Parrish, Nathaniel
    Sudakov, Benjamin
    Eskin, Eleazar
    [J]. BMC GENOMICS, 2013, 14