A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes

被引:1
|
作者
Guo, Jindan [1 ]
Pang, Erli [1 ]
Song, Hongtao [1 ]
Lin, Kui [1 ]
机构
[1] Beijing Normal Univ, Coll Life Sci, State Key Lab Earth Surface Proc & Resource Ecol, Minist Educ,Key Lab Biodivers Sci & Ecol Engn, Beijing, Peoples R China
关键词
Genome graph; Coordinate system; Variant detection; MULTIPLE SEQUENCE ALIGNMENT; GENOME; EVOLUTION; ALGORITHM; REPRESENTATION; SUPERBUBBLES;
D O I
10.1186/s12859-021-04149-w
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. Results: We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (<50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. Conclusions: Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C+ +program for implementing our method that is available at https://github.com/eggleader/cSupB.
引用
收藏
页数:22
相关论文
共 4 条
  • [1] A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes
    Jindan Guo
    Erli Pang
    Hongtao Song
    Kui Lin
    [J]. BMC Bioinformatics, 22
  • [2] deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
    Liu, Bo
    Liu, Yadong
    Li, Junyi
    Guo, Hongzhe
    Zang, Tianyi
    Wang, Yadong
    [J]. GENOME BIOLOGY, 2019, 20 (01)
  • [3] deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
    Bo Liu
    Yadong Liu
    Junyi Li
    Hongzhe Guo
    Tianyi Zang
    Yadong Wang
    [J]. Genome Biology, 20
  • [4] KSNP: a fast de Bruijn graph-based haplotyping tool approaching data-in time cost
    Zhou, Qian
    Ji, Fahu
    Lin, Dongxiao
    Liu, Xianming
    Zhu, Zexuan
    Ruan, Jue
    [J]. NATURE COMMUNICATIONS, 2024, 15 (01)