Compression of Next-Generation Sequencing Data and of DNA Digital Files

被引:3
|
作者
Carpentieri, Bruno [1 ]
机构
[1] Univ Salerno, Dipartimento Informat, Via Giovanni Paolo II 132, I-84084 Fisciano, SA, Italy
关键词
data compression; Next-Generation Sequencing data; DNA; genomes; GENOMIC DATA;
D O I
10.3390/a13060151
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression:Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Next-generation DNA sequencing
    Shendure, Jay
    Ji, Hanlee
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (10) : 1135 - 1145
  • [2] Next-generation DNA sequencing
    Jay Shendure
    Hanlee Ji
    [J]. Nature Biotechnology, 2008, 26 : 1135 - 1145
  • [3] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [4] Transformations for the compression of FASTQ quality scores of next-generation sequencing data
    Wan, Raymond
    Vo Ngoc Anh
    Asai, Kiyoshi
    [J]. BIOINFORMATICS, 2012, 28 (05) : 628 - 635
  • [5] Next-generation DNA sequencing methods
    Mardis, Elaine R.
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2008, 9 : 387 - 402
  • [6] Next-generation DNA damage sequencing
    Mingard, Cecile
    Wu, Junzhou
    McKeague, Maureen
    Sturla, Shana J.
    [J]. CHEMICAL SOCIETY REVIEWS, 2020, 49 (20) : 7354 - 7377
  • [7] Next-Generation DNA Sequencing Technologies
    Kurekci, Gulsum Kayman
    Dincer, Pervin
    [J]. ERCIYES MEDICAL JOURNAL, 2014, 36 (03) : 99 - 103
  • [8] Next-generation DNA sequencing techniques
    Ansorge, Wilhelm J.
    [J]. NEW BIOTECHNOLOGY, 2009, 25 (04) : 195 - 203
  • [9] Automated Digital Microfluidic Sample Preparation for Next-Generation DNA Sequencing
    Kim, Hanyoup
    Bartsch, Michael S.
    Renzi, Ronald F.
    He, Jim
    Van de Vreugde, James L.
    Claudnic, Mark R.
    Patel, Kamlesh D.
    [J]. JALA, 2011, 16 (06): : 405 - 414
  • [10] Compression of next generation sequencing data
    Nalbantoglu, O. U.
    Riffle, A.
    Sayood, K.
    [J]. 2015 DATA COMPRESSION CONFERENCE (DCC), 2015, : 464 - 464