MZPAQ: a FASTQ data compression tool

被引:3
|
作者
El Allali, Achraf [1 ]
Arshad, Mariam [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh, Saudi Arabia
来源
关键词
DNA compression; Next generation sequences; FASTA files; FASTQ files; ALGORITHM;
D O I
10.1186/s13029-019-0073-5
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
BackgroundDue to the technological progress in Next Generation Sequencing (NGS), the amount of genomic data that is produced daily has seen a tremendous increase. This increase has shifted the bottleneck of genomic projects from sequencing to computation and specifically storing, managing and analyzing the large amount of NGS data. Compression tools can reduce the physical storage used to save large amount of genomic data as well as the bandwidth used to transfer this data. Recently, DNA sequence compression has gained much attention among researchers.ResultsIn this paper, we study different techniques and algorithms used to compress genomic data. Most of these techniques take advantage of some properties that are unique to DNA sequences in order to improve the compression rate, and usually perform better than general-purpose compressors. By exploring the performance of available algorithms, we produce a powerful compression tool for NGS data called MZPAQ. Results show that MZPAQ outperforms state-of-the-art tools on all benchmark datasets obtained from a recent survey in terms of compression ratio. MZPAQ offers the best compression ratios regardless of the sequencing platform or the size of the data.ConclusionsCurrently, MZPAQ's strength is its higher compression ratio as well as its compatibility with all major sequencing platforms. MZPAQ is more suitable when the size of compressed data is crucial, such as long-term storage and data transfer. More efforts will be made in the future to target other aspects such as compression speed and memory utilization.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] DSRC 2-Industry-oriented compression of FASTQ files
    Roguski, Lukasz
    Deorowicz, Sebastian
    [J]. BIOINFORMATICS, 2014, 30 (15) : 2213 - 2215
  • [22] A new efficient referential genome compression technique for FastQ files
    Sanjeev Kumar
    Mukund Pratap Singh
    Soumya Ranjan Nayak
    Asif Uddin Khan
    Anuj Kumar Jain
    Prabhishek Singh
    Manoj Diwakar
    Thota Soujanya
    [J]. Functional & Integrative Genomics, 2023, 23
  • [23] A new efficient referential genome compression technique for FastQ files
    Kumar, Sanjeev
    Singh, Mukund Pratap
    Nayak, Soumya Ranjan
    Khan, Asif Uddin
    Jain, Anuj Kumar
    Singh, Prabhishek
    Diwakar, Manoj
    Soujanya, Thota
    [J]. FUNCTIONAL & INTEGRATIVE GENOMICS, 2023, 23 (04)
  • [24] FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets
    Dutta, Anirban
    Haque, Mohammed Monzoorul
    Bose, Tungadri
    Reddy, C. V. S. K.
    Mande, Sharmila S.
    [J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2015, 13 (03)
  • [25] SPRING: a next-generation compressor for FASTQ data
    Chandak, Shubham
    Tatwawadi, Kedar
    Ochoa, Idoia
    Hernaez, Mikel
    Weissman, Tsachy
    [J]. BIOINFORMATICS, 2019, 35 (15) : 2674 - 2676
  • [26] Tackling the Challenges of FASTQ Referential Compression (vol 13, pg 1, 2019)
    Guerra, Anibal
    Lotero, Jaime
    Edinson Aedo, Jose
    Isaza, Sebastian
    [J]. BIOINFORMATICS AND BIOLOGY INSIGHTS, 2019, 13
  • [27] GReEn: a tool for efficient compression of genome resequencing data
    Pinho, Armando J.
    Pratas, Diogo
    Garcia, Sara P.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (04)
  • [28] A lossless FASTQ quality scores file compression algorithm based on linear combination prediction
    Fu, Jiabing
    Ma, Yacong
    Dong, Shoubin
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1894 - 1896
  • [29] LW-FQZip 2: a parallelized reference-based compression of FASTQ files
    Huang, Zhi-An
    Wen, Zhenkun
    Deng, Qingjin
    Chu, Ying
    Sun, Yiwen
    Zhu, Zexuan
    [J]. BMC BIOINFORMATICS, 2017, 18
  • [30] LW-FQZip 2: a parallelized reference-based compression of FASTQ files
    Zhi-An Huang
    Zhenkun Wen
    Qingjin Deng
    Ying Chu
    Yiwen Sun
    Zexuan Zhu
    [J]. BMC Bioinformatics, 18