Compression of next-generation sequencing quality scores using memetic algorithm

被引:5
|
作者
Zhou, Jiarui [1 ,2 ]
Ji, Zhen [2 ]
Zhu, Zexuan [2 ]
He, Shan [3 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Hangzhou 310027, Zhejiang, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen City Key Lab Embedded Syst Design, Shenzhen 518060, Peoples R China
[3] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
中国国家自然科学基金;
关键词
DIFFERENTIAL EVOLUTION; OPTIMIZATION;
D O I
10.1186/1471-2105-15-S15-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. Results: In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. Conclusions: The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Compression of next-generation sequencing quality scores using memetic algorithm
    Jiarui Zhou
    Zhen Ji
    Zexuan Zhu
    Shan He
    BMC Bioinformatics, 15
  • [2] Transformations for the compression of FASTQ quality scores of next-generation sequencing data
    Wan, Raymond
    Vo Ngoc Anh
    Asai, Kiyoshi
    BIOINFORMATICS, 2012, 28 (05) : 628 - 635
  • [3] A Parallel Algorithm for Compression of Big Next-Generation Sequencing Datasets
    Perez, Sandino Vargas
    Saeed, Fahad
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 196 - 201
  • [4] Next-Generation Sequencing: Next-Generation Quality in Pediatrics
    Wortmann, Saskia B.
    Spenger, Johannes
    Preisel, Martin
    Koch, Johannes
    Rauscher, Christian
    Bader, Ingrid
    Mayr, Johannes A.
    Sperl, Wolfgang
    PADIATRIE UND PADOLOGIE, 2018, 53 (06): : 278 - 283
  • [5] Exploring the Consistency of the Quality Scores with Machine Learning for Next-Generation Sequencing Experiments
    Cosgun, Erdal
    Oh, Min
    BIOMED RESEARCH INTERNATIONAL, 2020, 2020
  • [6] Comparing nominal and real quality scores on next-generation sequencing genotype calls
    Alexander H Stram
    BMC Proceedings, 5 (Suppl 9)
  • [7] Quality Guidelines for Next-Generation Sequencing
    Baudhuin, Linnea M.
    CLINICAL CHEMISTRY, 2013, 59 (05) : 858 - 859
  • [8] Quality Control in Next-Generation Sequencing Using DNA fingerprinting
    Akabari, R.
    Zheng, Z.
    Lal, J.
    Gandhi, S.
    Qin, D.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2015, 17 (06): : 847 - 847
  • [9] ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
    Cabanski, Christopher R.
    Cavin, Keary
    Bizon, Chris
    Wilkerson, Matthew D.
    Parker, Joel S.
    Wilhelmsen, Kirk C.
    Perou, Charles M.
    Marron, J. S.
    Hayes, D. Neil
    BMC BIOINFORMATICS, 2012, 13
  • [10] ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
    Christopher R Cabanski
    Keary Cavin
    Chris Bizon
    Matthew D Wilkerson
    Joel S Parker
    Kirk C Wilhelmsen
    Charles M Perou
    JS Marron
    D Neil Hayes
    BMC Bioinformatics, 13