Predictive Coding of Aligned Next-Generation Sequencing Data

被引:5
|
作者
Voges, Jan [1 ]
Munderloh, Marco [1 ]
Ostermann, Joern [1 ]
机构
[1] Leibniz Univ Hannover, TNT, Inst Informat Verarbeitung, Appelstr 9A, D-30167 Hannover, Germany
关键词
READ ALIGNMENT; COMPRESSION; FORMAT;
D O I
10.1109/DCC.2016.98
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to novel high-throughput next-generation sequencing technologies, the sequencing of huge amounts of genetic information has become affordable. On account of this flood of data, IT costs have become a major obstacle compared to sequencing costs. High-performance compression of genomic data is required to reduce the storage size and transmission costs. The high coverage inherent in next-generation sequencing technologies produces highly redundant data. This paper describes a compression algorithm for aligned sequence reads. The proposed algorithm combines alignment information to implicitly assemble local parts of the donor genome in order to compress the sequence reads. In contrast to other algorithms, the proposed compressor does not need a reference to encode sequence reads. Compression is performed on-the-fly using solely a sliding window (i.e. a permanently updated short-time memory) as context for the prediction of sequence reads. The algorithm yields compression results on par or better than the state-of-the-art, compressing the data down to 1.9% of the original size at speeds of up to 60 MB/s and with a minute memory consumption of only several kilobytes, fitting in today's level 1 CPU caches.
引用
收藏
页码:241 / 250
页数:10
相关论文
共 50 条
  • [31] Visual programming for next-generation sequencing data analytics
    Milicchio, Franco
    Rose, Rebecca
    Bian, Jiang
    Min, Jae
    Prosperi, Mattia
    [J]. BIODATA MINING, 2016, 9
  • [32] Computational classification of microRNAs in next-generation sequencing data
    Riback, Joshua
    Hatzigeorgiou, Artemis G.
    Reczko, Martin
    [J]. THEORETICAL CHEMISTRY ACCOUNTS, 2010, 125 (3-6) : 637 - 642
  • [33] SeedsGraph: an efficient assembler for next-generation sequencing data
    Wang, Chunyu
    Guo, Maozu
    Liu, Xiaoyan
    Liu, Yang
    Zou, Quan
    [J]. BMC MEDICAL GENOMICS, 2015, 8
  • [34] Zseq: An Approach for Preprocessing Next-Generation Sequencing Data
    Alkhateeb, Abedalrhman
    Rueda, Luis
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (08) : 746 - 755
  • [35] Qualimap: evaluating next-generation sequencing alignment data
    Garcia-Alcalde, Fernando
    Okonechnikov, Konstantin
    Carbonell, Jose
    Cruz, Luis M.
    Goetz, Stefan
    Tarazona, Sonia
    Dopazo, Joaquin
    Meyer, Thomas F.
    Conesa, Ana
    [J]. BIOINFORMATICS, 2012, 28 (20) : 2678 - 2679
  • [36] Next-generation sequencing data analysis on cloud computing
    Kwon, Taesoo
    Yoo, Won Gi
    Lee, Won-Ja
    Kim, Won
    Kim, Dae-Won
    [J]. GENES & GENOMICS, 2015, 37 (06) : 489 - 501
  • [37] Extending KNIME for next-generation sequencing data analysis
    Jagla, Bernd
    Wiswedel, Bernd
    Coppee, Jean-Yves
    [J]. BIOINFORMATICS, 2011, 27 (20) : 2907 - 2909
  • [38] NGSphy: phylogenomic simulation of next-generation sequencing data
    Escalona, Merly
    Rocha, Sara
    Posada, David
    [J]. BIOINFORMATICS, 2018, 34 (14) : 2506 - 2507
  • [39] Next-generation sequencing data analysis on cloud computing
    Taesoo Kwon
    Won Gi Yoo
    Won-Ja Lee
    Won Kim
    Dae-Won Kim
    [J]. Genes & Genomics, 2015, 37 : 489 - 501
  • [40] Computational classification of microRNAs in next-generation sequencing data
    Joshua Riback
    Artemis G. Hatzigeorgiou
    Martin Reczko
    [J]. Theoretical Chemistry Accounts, 2010, 125 : 637 - 642