Chaining for accurate alignment of erroneous long reads to acyclic variation graphs

被引:8
|
作者
Ma, Jun [1 ]
Caceres, Manuel [1 ]
Salmela, Leena [1 ]
Makinen, Veli [1 ]
Tomescu, Alexandru, I [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland
基金
欧洲研究理事会; 芬兰科学院;
关键词
ALGORITHMS;
D O I
10.1093/bioinformatics/btad460
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. Results: We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting. Availability and implementation: GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads
    Warren, Rene L.
    Yang, Chen
    Vandervalk, Benjamin P.
    Behsaz, Bahar
    Lagman, Albert
    Jones, Steven J. M.
    Birol, Inanc
    GIGASCIENCE, 2015, 4
  • [42] BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs
    Mahdi Heydari
    Giles Miclotte
    Yves Van de Peer
    Jan Fostier
    BMC Bioinformatics, 19
  • [43] BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs
    Heydari, Mahdi
    Miclotte, Giles
    Van de Peer, Yves
    Fostier, Jan
    BMC BIOINFORMATICS, 2018, 19
  • [44] Joint Analysis of Long and Short Reads Enables Accurate Estimates of Microbiome Complexity
    Bankevich, Anton
    Pevzner, Pavel A.
    CELL SYSTEMS, 2018, 7 (02) : 192 - +
  • [45] RecGraph: recombination-aware alignment of sequences to variation graphs
    Cartes, Jorge Avila
    Bonizzoni, Paola
    Ciccolella, Simone
    Della Vedova, Gianluca
    Denti, Luca
    Didelot, Xavier
    Monti, Davide Cesare
    Pirola, Yuri
    BIOINFORMATICS, 2024, 40 (05)
  • [46] Assembly of long error-prone reads using de Bruijn graphs
    Lin, Yu
    Yuan, Jeffrey
    Kolmogorov, Mikhail
    Shen, Max W.
    Chaisson, Mark
    Pevzner, Pavel A.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (52) : E8396 - E8405
  • [47] HapCol: accurate and memory-efficient haplotype assembly from long reads
    Pirola, Yuri
    Zaccaria, Simone
    Dondi, Riccardo
    Klau, Gunnar W.
    Pisanti, Nadia
    Bonizzoni, Paola
    BIOINFORMATICS, 2016, 32 (11) : 1610 - 1617
  • [48] High-quality metagenome assembly from long accurate reads with metaMDBG
    Benoit, Gaetan
    Raguideau, Sebastien
    James, Robert
    Phillippy, Adam M.
    Chikhi, Rayan
    Quince, Christopher
    NATURE BIOTECHNOLOGY, 2024, 42 (09) : 1378 - 1383
  • [49] NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads
    Hu, Jiang
    Wang, Zhuo
    Sun, Zongyi
    Hu, Benxia
    Ayoola, Adeola Oluwakemi
    Liang, Fan
    Li, Jingjing
    Sandoval, Jose R.
    Cooper, David N.
    Ye, Kai
    Ruan, Jue
    Xiao, Chuan-Le
    Wang, Depeng
    Wu, Dong-Dong
    Wang, Sheng
    GENOME BIOLOGY, 2024, 25 (01)
  • [50] Fast and accurate de novo genome assembly from long uncorrected reads
    Vaser, Robert
    Sovic, Ivan
    Nagarajan, Niranjan
    Sikic, Mile
    GENOME RESEARCH, 2017, 27 (05) : 737 - 746