Chaining for accurate alignment of erroneous long reads to acyclic variation graphs

被引:8
|
作者
Ma, Jun [1 ]
Caceres, Manuel [1 ]
Salmela, Leena [1 ]
Makinen, Veli [1 ]
Tomescu, Alexandru, I [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland
基金
欧洲研究理事会; 芬兰科学院;
关键词
ALGORITHMS;
D O I
10.1093/bioinformatics/btad460
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. Results: We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting. Availability and implementation: GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Efficient mapping of accurate long reads in minimizer space with mapquik
    Ekim, Baris
    Sahlin, Kristoffer
    Medvedev, Paul
    Berger, Bonnie
    Chikhi, Rayan
    GENOME RESEARCH, 2023, 33 (07) : 1188 - 1197
  • [22] Acyclic coloring of graphs without bichromatic long path
    Hou, Jianfeng
    Wu, Shufei
    FRONTIERS OF MATHEMATICS IN CHINA, 2015, 10 (06) : 1343 - 1354
  • [23] Acyclic coloring of graphs without bichromatic long path
    Jianfeng Hou
    Shufei Wu
    Frontiers of Mathematics in China, 2015, 10 : 1343 - 1354
  • [24] Pangenome graphs in infectious disease: a comprehensive genetic variation analysis of Neisseria meningitidis leveraging Oxford Nanopore long reads
    Yang, Zuyu
    Guarracino, Andrea
    Biggs, Patrick J.
    Black, Michael A.
    Ismail, Nuzla
    Wold, Jana Renee
    Merriman, Tony R.
    Prins, Pjotr
    Garrison, Erik
    de Ligt, Joep
    FRONTIERS IN GENETICS, 2023, 14
  • [25] Rapid and accurate alignment of nucleotide conversion sequencing reads with HISAT-3N
    Zhang, Yun
    Park, Chanhee
    Bennett, Christopher
    Thornton, Micah
    Kim, Daehwan
    GENOME RESEARCH, 2021, 31 (07) : 1290 - 1295
  • [26] Highly accurate long reads are crucial for realizing the potential of biodiversity genomics
    Hotaling, Scott
    Wilcox, Edward R.
    Heckenhauer, Jacqueline
    Stewart, Russell J.
    Frandsen, Paul B.
    BMC GENOMICS, 2023, 24 (01)
  • [27] Ultra-accurate microbial amplicon sequencing with synthetic long reads
    Callahan, Benjamin J.
    Grinevich, Dmitry
    Thakur, Siddhartha
    Balamotis, Michael A.
    Ben Yehezkel, Tuval
    MICROBIOME, 2021, 9 (01)
  • [28] Metagenomics Binning of Long Reads Using Read-Overlap Graphs
    Wickramarachchi, Anuradha
    Lin, Yu
    COMPARATIVE GENOMICS (RECOMB-CG 2022), 2022, 13234 : 260 - 278
  • [29] Assembly of Long Error-Prone Reads Using Repeat Graphs
    Kolmogorov, Mikhail
    Yuan, Jeffrey
    Lin, Yu
    Pevzner, Pavel
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 261 - 262
  • [30] Highly accurate long reads are crucial for realizing the potential of biodiversity genomics
    Scott Hotaling
    Edward R. Wilcox
    Jacqueline Heckenhauer
    Russell J. Stewart
    Paul B. Frandsen
    BMC Genomics, 24