Compression of next-generation sequencing quality scores using memetic algorithm

被引:5
|
作者
Zhou, Jiarui [1 ,2 ]
Ji, Zhen [2 ]
Zhu, Zexuan [2 ]
He, Shan [3 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Hangzhou 310027, Zhejiang, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen City Key Lab Embedded Syst Design, Shenzhen 518060, Peoples R China
[3] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
中国国家自然科学基金;
关键词
DIFFERENTIAL EVOLUTION; OPTIMIZATION;
D O I
10.1186/1471-2105-15-S15-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. Results: In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. Conclusions: The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Transcriptome analysis using next-generation sequencing
    Mutz, Kai-Oliver
    Heilkenbrinker, Alexandra
    Loenne, Maren
    Walter, Johanna-Gabriela
    Stahl, Frank
    CURRENT OPINION IN BIOTECHNOLOGY, 2013, 24 (01) : 22 - 30
  • [32] Quality Assurance Practice in Clinical Next-Generation Sequencing
    Durso, M.
    Mantha, G.
    Wald, A.
    Roy, S.
    Ng, Y.
    Nikiforova, M. N.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2013, 15 (06): : 903 - 904
  • [33] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)
  • [35] INCORPORATING NEXT-GENERATION SEQUENCING IN THE MANAGEMENT ALGORITHM OF PANCREATIC CYSTS
    Jones, Alex R.
    Bardhi, Olgert
    Tielleman, Thomas
    Ellis, Daniel J.
    Vanderveldt, Hendrikus
    Tavakkoli, Anna
    Polanco, Patricio M.
    Goldschmiedt, Markus
    Mansour, John
    Singhi, Aatur
    Kubiliun, Nisa
    Sawas, Tarek
    GASTROENTEROLOGY, 2023, 164 (06) : S63 - S63
  • [36] HUMAN DISEASE Next-generation sequencing of the next generation
    Burgess, Darren J.
    NATURE REVIEWS GENETICS, 2011, 12 (02) : 78 - 79
  • [37] GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies
    Gritsenko, Alexey A.
    Nijkamp, Jurgen F.
    Reinders, Marcel J. T.
    de Ridder, Dick
    BIOINFORMATICS, 2012, 28 (11) : 1429 - 1437
  • [38] Next-generation sequencing in epigenetics
    Zeschnigk, Michael
    Horsthemke, Bernhard
    MEDIZINISCHE GENETIK, 2019, 31 (02) : 205 - 211
  • [39] The chemistry of next-generation sequencing
    Raphaël Rodriguez
    Yamuna Krishnan
    Nature Biotechnology, 2023, 41 : 1709 - 1715
  • [40] Next-generation sequencing in the clinic
    Jason Y Park
    Larry J Kricka
    Paolo Fortina
    Nature Biotechnology, 2013, 31 : 990 - 992