Compression of next-generation sequencing quality scores using memetic algorithm

被引:5
|
作者
Zhou, Jiarui [1 ,2 ]
Ji, Zhen [2 ]
Zhu, Zexuan [2 ]
He, Shan [3 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Hangzhou 310027, Zhejiang, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen City Key Lab Embedded Syst Design, Shenzhen 518060, Peoples R China
[3] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
中国国家自然科学基金;
关键词
DIFFERENTIAL EVOLUTION; OPTIMIZATION;
D O I
10.1186/1471-2105-15-S15-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. Results: In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. Conclusions: The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Next-generation DNA sequencing
    Shendure, Jay
    Ji, Hanlee
    NATURE BIOTECHNOLOGY, 2008, 26 (10) : 1135 - 1145
  • [42] Advancements in Next-Generation Sequencing
    Levy, Shawn E.
    Myers, Richard M.
    ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 17, 2016, 17 : 95 - 115
  • [43] Next-Generation Sequencing Challenges
    Baker S.C.
    2017, Mary Ann Liebert Inc. (37): : 1and14 - 15
  • [44] Next-Generation Sequencing Technologies
    McCombie, W. Richard
    McPherson, John D.
    Mardis, Elaine R.
    COLD SPRING HARBOR PERSPECTIVES IN MEDICINE, 2019, 9 (11):
  • [45] Next-generation sequencing: The race is on
    von Bubnoff, Andreas
    CELL, 2008, 132 (05) : 721 - 723
  • [46] Combinatorics and next-generation sequencing
    Patterson, Nick
    Gabriel, Stacey
    NATURE BIOTECHNOLOGY, 2009, 27 (09) : 826 - 827
  • [47] Next-generation sequencing in ophthalmology
    Wolf, Julian
    Lange, Clemens
    Reinhard, Thomas
    Schlunck, Guenther
    SPEKTRUM DER AUGENHEILKUNDE, 2024, 38 (06) : 260 - 270
  • [48] Next-Generation Sequencing in Cancer
    S. Vinod Nair
    Gigi Madhulaxmi
    Ravindran Thomas
    Journal of Maxillofacial and Oral Surgery, 2021, 20 : 340 - 344
  • [49] NEXT-GENERATION SEQUENCING, THE BASICS
    Corbett, Mark
    CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY, 2011, 39 : 89 - 89
  • [50] Next-generation sequencing and norovirus
    Cotten, Matthew
    Koopmans, Marion
    FUTURE VIROLOGY, 2016, 11 (11) : 719 - 722