FCompress: An Algorithm for FASTQ Sequence Data Compression

被引:2
|
作者
Sardaraz, Muhammad [1 ]
Tahir, Muhammad [1 ]
机构
[1] COMSATS Inst Informat Technol, Dept Comp Sci, Attock, Pakistan
关键词
High throughput sequencing; NGS technologies; NGS sequence compression; Huffman Coding; Fcompress; Algorithm; GENOMIC SEQUENCE;
D O I
10.2174/1574893613666180322125337
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Biological sequence data have increased at a rapid rate due to the advancements in sequencing technologies and reduction in the cost of sequencing data. The huge increase in these data presents significant research challenges to researchers. In addition to meaningful analysis, data storage is also a challenge, an increase in data production is outpacing the storage capacity. Data compression is used to reduce the size of data and thus reduces storage requirements as well as transmission cost over the internet. Objective: This article presents a novel compression algorithm (FCompress) for Next Generation Sequencing (NGS) data in FASTQ format. Method: The proposed algorithm uses bits manipulation and dictionary-based compression for bases compression. Headers are compressed with reference-based compression, whereas quality scores are compressed with Huffman coding. Results: The proposed algorithm is validated with experimental results on real datasets. The results are compared with both general purpose and specialized compression programs. Conclusion: The proposed algorithm produces better compression ratio in a comparable time to other algorithms.
引用
收藏
页码:123 / 129
页数:7
相关论文
共 50 条
  • [41] A Novel Data Compression Algorithm for Dynamic Data
    Gupta, Rahul
    Gupta, Ashutosh
    Agarwal, Suneeta
    2008 IEEE REGION 8 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNOLOGIES IN ELECTRICAL AND ELECTRONICS ENGINEERING: SIBIRCON 2008, PROCEEDINGS, 2008, : 266 - +
  • [42] A new lossless neighborhood indexing sequence (NIS) algorithm for data compression in wireless sensor networks
    Uthayakumar, J.
    Vengattaraman, T.
    Dhavachelvan, P.
    AD HOC NETWORKS, 2019, 83 : 149 - 157
  • [43] A Compression Error and Optimize Compression Algorithm for Vector Data
    Tan, Guolv
    Wang, Yujun
    2009 INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND INFORMATION APPLICATION TECHNOLOGY, VOL II, PROCEEDINGS, 2009, : 522 - +
  • [44] GTZ: a fast compression and cloud transmission tool optimized for FASTQ files
    Xing, Yuting
    Li, Gen
    Wang, Zhenguo
    Feng, Bolun
    Song, Zhuo
    Wu, Chengkun
    BMC BIOINFORMATICS, 2017, 18
  • [45] FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets
    Dutta, Anirban
    Haque, Mohammed Monzoorul
    Bose, Tungadri
    Reddy, C. V. S. K.
    Mande, Sharmila S.
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2015, 13 (03)
  • [46] Lossless and reference-free compression of FASTQ/A files using GeneSqueeze
    Nazari, Foad
    Patel, Sneh
    Larocca, Melissa
    Sansevich, Alina
    Czarny, Ryan
    Schena, Giana
    Murray, Emma K.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [47] LZP: A new data compression algorithm
    Bloom, C
    DCC '96 - DATA COMPRESSION CONFERENCE, PROCEEDINGS, 1996, : 425 - 425
  • [48] A predictive algorithm for multimedia data compression
    Reza Moradi Rad
    Abdolrahman Attar
    Asadollah Shahbahrami
    Multimedia Systems, 2013, 19 : 103 - 115
  • [49] A LINEAR ALGORITHM FOR DATA-COMPRESSION
    BRENT, RP
    AUSTRALIAN COMPUTER JOURNAL, 1987, 19 (02): : 64 - 68
  • [50] Efficient Compression Algorithm for Multimedia Data
    Pratap, Rameshwar
    Revanuru, Karthik
    Anirudh, Ravi
    Kulkarni, Raghav
    2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 245 - 250