Chaining for accurate alignment of erroneous long reads to acyclic variation graphs

被引:8
|
作者
Ma, Jun [1 ]
Caceres, Manuel [1 ]
Salmela, Leena [1 ]
Makinen, Veli [1 ]
Tomescu, Alexandru, I [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland
基金
欧洲研究理事会; 芬兰科学院;
关键词
ALGORITHMS;
D O I
10.1093/bioinformatics/btad460
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. Results: We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting. Availability and implementation: GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] Fast alignment of reads to a variation graph with application to SNP detection
    Monsu, Maurilio
    Comin, Matteo
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2021, 18 (04)
  • [12] Accurate isoform discovery with IsoQuant using long reads
    Andrey D. Prjibelski
    Alla Mikheenko
    Anoushka Joglekar
    Alexander Smetanin
    Julien Jarroux
    Alla L. Lapidus
    Hagen U. Tilgner
    Nature Biotechnology, 2023, 41 : 915 - 918
  • [13] Fast and Accurate Algorithms for Mapping and Aligning Long Reads
    Yang, Wen
    Wang, Lusheng
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (08) : 789 - 803
  • [14] Accurate isoform discovery with IsoQuant using long reads
    Prjibelski, Andrey D.
    Mikheenko, Alla
    Joglekar, Anoushka
    Smetanin, Alexander
    Jarroux, Julien
    Lapidus, Alla L.
    Tilgner, Hagen U.
    NATURE BIOTECHNOLOGY, 2023, 41 (07) : 915 - +
  • [15] Long Reads Enable Accurate Estimates of Complexity of Metagenomes
    Bankevich, Anton
    Pevzner, Pavel
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 1 - 20
  • [16] Efficient Local Alignment Discovery amongst Noisy Long Reads
    Myers, Gene
    ALGORITHMS IN BIOINFORMATICS, 2014, 8701 : 52 - 67
  • [17] rHAT: fast alignment of noisy long reads with regional hashing
    Liu, Bo
    Guan, Dengfeng
    Teng, Mingxiang
    Wang, Yadong
    BIOINFORMATICS, 2016, 32 (11) : 1625 - 1631
  • [18] ON COMPUTING ACCURATE SINGULAR-VALUES AND EIGENVALUES OF MATRICES WITH ACYCLIC GRAPHS
    DEMMEL, JW
    GRAGG, W
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1993, 185 : 203 - 217
  • [19] Haplotype threading: accurate polyploid phasing from long reads
    Sven D. Schrinner
    Rebecca Serra Mari
    Jana Ebler
    Mikko Rautiainen
    Lancelot Seillier
    Julia J. Reimer
    Björn Usadel
    Tobias Marschall
    Gunnar W. Klau
    Genome Biology, 21
  • [20] Haplotype threading: accurate polyploid phasing from long reads
    Schrinner, Sven D.
    Mari, Rebecca Serra
    Ebler, Jana
    Rautiainen, Mikko
    Seillier, Lancelot
    Reimer, Julia J.
    Usadel, Bjoern
    Marschall, Tobias
    Klau, Gunnar W.
    GENOME BIOLOGY, 2020, 21 (01)