High-Throughput Compression of FASTQ Data with SeqDB

被引:23
|
作者
Howison, Mark [1 ]
机构
[1] Brown Univ, Ctr Computat & Visualizat, Providence, RI 02912 USA
关键词
Compression; data storage; next-generation sequencing; FASTQ; SEQUENCE; FORMAT;
D O I
10.1109/TCBB.2012.160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Compression has become a critical step in storing next-generation sequencing (NGS) data sets because of both the increasing size and decreasing costs of such data. Recent research into efficiently compressing sequence data has focused largely on improving compression ratios. Yet, the throughputs of current methods now lag far behind the I/O bandwidths of modern storage systems. As biologists move their analyses to high-performance systems with greater I/O bandwidth, low-throughput compression becomes a limiting factor. To address this gap, we present a new storage model called SeqDB, which offers high-throughput compression of sequence data with minimal sacrifice in compression ratio. It achieves this by combining the existing multithreaded Blosc compressor with a new data-parallel byte-packing scheme, called SeqPack, which interleaves sequence data and quality scores.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [31] Tools for mapping high-throughput sequencing data
    Fonseca, Nuno A.
    Rung, Johan
    Brazma, Alvis
    Marioni, John C.
    [J]. BIOINFORMATICS, 2012, 28 (24) : 3169 - 3177
  • [32] KEGGanim:: pathway animations for high-throughput data
    Adler, Priit
    Reimand, Jueri
    Jaenes, Juergen
    Kolde, Raivo
    Peterson, Hedi
    Vilo, Jaak
    [J]. BIOINFORMATICS, 2008, 24 (04) : 588 - 590
  • [33] HIGH-THROUGHPUT DATA ANALYSIS IN BEHAVIOR GENETICS
    Sakov, Anat
    Golani, Ilan
    Lipkind, Dina
    Benjamini, Yoav
    [J]. ANNALS OF APPLIED STATISTICS, 2010, 4 (02): : 743 - 763
  • [34] Quantitative analysis of high-throughput biological data
    Juan, Hsueh-Fen
    Huang, Hsuan-Cheng
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2023, 13 (04)
  • [35] Genome reassembly with high-throughput sequencing data
    Nathaniel Parrish
    Benjamin Sudakov
    Eleazar Eskin
    [J]. BMC Genomics, 14
  • [36] Exploratory analysis of high-throughput metabolomic data
    Chalini D. Wijetunge
    Zhaoping Li
    Isaam Saeed
    Jairus Bowne
    Arthur L. Hsu
    Ute Roessner
    Antony Bacic
    Saman K. Halgamuge
    [J]. Metabolomics, 2013, 9 : 1311 - 1320
  • [37] High-throughput DNA synthesis for data storage
    Yu, Meng
    Tang, Xiaohui
    Li, Zhenhua
    Wang, Weidong
    Wang, Shaopeng
    Li, Min
    Yu, Qiuliyang
    Xie, Sijia
    Zuo, Xiaolei
    Chen, Chang
    [J]. CHEMICAL SOCIETY REVIEWS, 2024, 53 (09) : 4463 - 4489
  • [38] Data Integration and Reproducibility for High-Throughput Transcriptomics
    Mooney, Michael
    McWeeney, Shannon
    [J]. BRAIN TRANSCRIPTOME, 2014, 116 : 55 - 71
  • [39] Biologically inspired image compression in biomedical high-throughput screening
    Seiffert, U
    [J]. BIOLOGICALLY INSPIRED APPROACHES TO ADVANCED INFORMATION TECHNOLOGY, 2004, 3141 : 428 - 439
  • [40] Protein function prediction with high-throughput data
    Xing-Ming Zhao
    Luonan Chen
    Kazuyuki Aihara
    [J]. Amino Acids, 2008, 35