Compression of next-generation sequencing quality scores using memetic algorithm

被引:5
|
作者
Zhou, Jiarui [1 ,2 ]
Ji, Zhen [2 ]
Zhu, Zexuan [2 ]
He, Shan [3 ]
机构
[1] Zhejiang Univ, Coll Biomed Engn & Instrument Sci, Hangzhou 310027, Zhejiang, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen City Key Lab Embedded Syst Design, Shenzhen 518060, Peoples R China
[3] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
中国国家自然科学基金;
关键词
DIFFERENTIAL EVOLUTION; OPTIMIZATION;
D O I
10.1186/1471-2105-15-S15-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. Results: In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. Conclusions: The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.
引用
收藏
页数:7
相关论文
共 50 条
  • [23] On Next-Generation Sequencing Compression via Multi-GPU
    De Luca, Pasquale
    Di Mauro, Annabella
    Fiscale, Stefano
    INTELLIGENT DISTRIBUTED COMPUTING XIV, 2022, 1026 : 457 - 466
  • [24] APPLICATIONS OF NEXT-GENERATION SEQUENCING Sequencing technologies - the next generation
    Metzker, Michael L.
    NATURE REVIEWS GENETICS, 2010, 11 (01) : 31 - 46
  • [25] Interrogating Pharmacogenetics Using Next-Generation Sequencing
    Ji, Yuan
    Shaaban, Sherin
    JOURNAL OF APPLIED LABORATORY MEDICINE, 2024, 9 (01): : 50 - 60
  • [26] Chimerism analysis using next-generation sequencing
    Iozzi, Sara
    Ciappi, Dario
    Palchetti, Simona
    Ricci, Ugo
    Rombola, Giovanni
    Pelo, Elisabetta
    HLA, 2023, 101 (04) : 391 - 391
  • [27] A Quality Management System for Clinical Next-Generation Sequencing
    Barakat, D. H.
    Livingston, R. J.
    Koehler, K.
    Chinn, F.
    Thomas, A.
    Boughton, G.
    Beightol, M.
    Hempelmann, J.
    Chung, M.
    Almeda, K. F.
    Slusher, R.
    Smith, C.
    Villanueva, G.
    Kaganovsky, J.
    Fareti, D.
    Baldwin, K.
    Shirts, B.
    Paulson, V.
    Konnick, E. Q.
    Pritchard, C. C.
    Lockwood, C.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2019, 21 (06): : 1220 - 1220
  • [28] Transcriptome Profiling Using Next-Generation Sequencing
    Asmann, Yan W.
    Wallace, Michael B.
    Thompson, E. Aubrey
    GASTROENTEROLOGY, 2008, 135 (05) : 1466 - 1468
  • [29] Screening for oncoviruses using next-generation sequencing
    El-Dinali, Mohamed
    Braegelmann, Johannes
    Salgia, Ravi
    Seiwert, Tanguy
    CANCER RESEARCH, 2010, 70
  • [30] QUALITY CONTROL IN NEXT-GENERATION SEQUENCING HLA TYPING
    Cano, Pedro
    Li, Ming
    HLA, 2017, 89 (06) : 430 - 431