Compression algorithm for colored de Bruijn graphs

被引:0
|
作者
Rahman, Amatur [1 ]
Dufresne, Yoann [4 ,5 ]
Medvedev, Paul [1 ,2 ,3 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
[3] Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA
[4] Univ Paris Cite, Inst Pasteur, G5 Sequence Bioinformat, Paris, France
[5] Univ Paris Cite, Inst Pasteur, Bioinformat & Biostat Hub, F-75015 Paris, France
关键词
SEARCH;
D O I
10.1186/s13015-024-00254-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs are used in a variety of applications, including variant calling, genome assembly, and database search. However, their size has posed a scalability challenge to algorithm developers and users. There have been numerous indexing data structures proposed that allow to store the graph compactly while supporting fast query operations. However, disk compression algorithms, which do not need to support queries on the compressed data and can thus be more space-efficient, have received little attention. The dearth of specialized compression tools has been a detriment to tool developers, tool users, and reproducibility efforts. In this paper, we develop a new tool that compresses colored de Bruijn graphs to disk, building on previous ideas for compression of k-mer sets and indexing colored de Bruijn graphs. We test our tool, called ESS-color, on various datasets, including both sequencing data and whole genomes. ESS-color achieves better compression than all evaluated tools and all datasets, with no other tool able to consistently achieve less than 44% space overhead. The software is available at http://github.com/medvedevgroup/ESSColor.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Succinct colored de Bruijn graphs
    Muggli, Martin D.
    Bowe, Alexander
    Noyes, Noelle R.
    Morley, Paul S.
    Belk, Keith E.
    Raymond, Robert
    Gagie, Travis
    Puglisi, Simon J.
    Boucher, Christina
    [J]. BIOINFORMATICS, 2017, 33 (20) : 3181 - 3187
  • [2] Colored de Bruijn graphs and the genome halving problem
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (01) : 98 - 107
  • [3] Meta-colored Compacted de Bruijn Graphs
    Pibiri, Giulio Ermanno
    Fan, Jason
    Patro, Rob
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 131 - 146
  • [4] De novo assembly and genotyping of variants using colored de Bruijn graphs
    Zamin Iqbal
    Mario Caccamo
    Isaac Turner
    Paul Flicek
    Gil McVean
    [J]. Nature Genetics, 2012, 44 : 226 - 232
  • [5] De novo assembly and genotyping of variants using colored de Bruijn graphs
    Iqbal, Zamin
    Caccamo, Mario
    Turner, Isaac
    Flicek, Paul
    McVean, Gil
    [J]. NATURE GENETICS, 2012, 44 (02) : 226 - 232
  • [6] Building large updatable colored de Bruijn graphs via merging
    Muggli, Martin D.
    Alipanahi, Bahar
    Boucher, Christina
    [J]. BIOINFORMATICS, 2019, 35 (14) : I51 - I60
  • [7] Metagenome SNP calling via read-colored de Bruijn graphs
    Alipanahi, Bahar
    Muggli, Martin D.
    Jundi, Musa
    Noyes, Noelle R.
    Boucher, Christina
    [J]. BIOINFORMATICS, 2020, 36 (22-23) : 5275 - 5281
  • [8] Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
    Cracco, Andrea
    Tomescu, Alexandru I.
    [J]. GENOME RESEARCH, 2023, 33 (07) : 1198 - 1207
  • [9] Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs
    Guillaume Holley
    Páll Melsted
    [J]. Genome Biology, 21
  • [10] Alignment- and reference-free phylogenomics with colored de Bruijn graphs
    Roland Wittler
    [J]. Algorithms for Molecular Biology, 15