Chaining for accurate alignment of erroneous long reads to acyclic variation graphs

被引：8

作者：

Ma, Jun ^{[1
]}

Caceres, Manuel ^{[1
]}

Salmela, Leena ^{[1
]}

Makinen, Veli ^{[1
]}

Tomescu, Alexandru, I ^{[1
]}

机构：

[1] Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland

来源：

BIOINFORMATICS | 2023年 / 39卷 / 08期

基金：

欧洲研究理事会; 芬兰科学院;

关键词：

ALGORITHMS;

D O I：

10.1093/bioinformatics/btad460

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit [Garrison et al. (Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018;36:875-9)] is a popular aligner of short reads, GraphAligner [Rautiainen and Marschall (GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol 2020;21:253-28)] is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. Results: We present a new algorithm to co-linearly chain a set of seeds in a string labeled acyclic graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of erroneous long reads to acyclic variation graphs, GraphChainer. We run experiments aligning real and simulated PacBio CLR reads with average error rates 15% and 5%. Compared to GraphAligner, GraphChainer aligns 12-17% more reads, and 21-28% more total read length, on real PacBio CLR reads from human chromosomes 1, 22, and the whole human pangenome. On both simulated and real data, GraphChainer aligns between 95% and 99% of all reads, and of total read length. We also show that minigraph [Li et al. (The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020;21:265-19.)] and minichain [Chandra and Jain (Sequence to graph alignment using gap-sensitive co-linear chaining. In: Proceedings of the 27th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2023). Springer, 2023, 58-73.)] obtain an accuracy of <60% on this setting. Availability and implementation: GraphChainer is freely available at https://github.com/algbio/GraphChainer. The datasets and evaluation pipeline can be reached from the previous address.

引用

页数：10

共 50 条

[1] Accurate spliced alignment of long RNA sequencing reads
Sahlin, Kristoffer
Makinen, Veli
BIOINFORMATICS, 2021, 37 (24) : 4643 - 4651
[2] HEURISTIC CHAINING IN DIRECTED ACYCLIC GRAPHS
VENUGOPAL, R
SRIKANT, YN
COMPUTER LANGUAGES, 1993, 19 (03): : 169 - 184
[3] VeChat: correcting errors in long reads using variation graphs
Luo, Xiao
Kang, Xiongbin
Schoenhuth, Alexander
NATURE COMMUNICATIONS, 2022, 13 (01)
[4] VeChat: correcting errors in long reads using variation graphs
Xiao Luo
Xiongbin Kang
Alexander Schönhuth
Nature Communications, 13
[5] Accurate self-correction of errors in long reads using de Bruijn graphs
Salmela, Leena
Walve, Riku
Rivals, Eric
Ukkonen, Esko
BIOINFORMATICS, 2017, 33 (06) : 799 - 806
[6] Gap-Sensitive Colinear Chaining Algorithms for Acyclic Pangenome Graphs
Chandra, Ghanshyam
Jain, Chirag
JOURNAL OF COMPUTATIONAL BIOLOGY, 2023, 30 (11) : 1182 - 1197
[7] BatAlign: an incremental method for accurate alignment of sequencing reads
Lim, Jing-Quan
Tennakoon, Chandana
Guan, Peiyong
Sung, Wing-Kin
NUCLEIC ACIDS RESEARCH, 2015, 43 (16)
[8] AccuRA: Accurate Alignment of Short Reads on Scalable Reconfigurable Accelerators
Natarajan, Santhi
Kumar, Krishna N.
Pal, Dehnath
Nandy, S. K.
PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING AND SIMULATION (SAMOS), 2016, : 79 - 87
[9] SAP-A Sequence Mapping and Analyzing Program for Long Sequence Reads Alignment and Accurate Variants Discovery
Sun, Zheng
Tian, Weidong
PLOS ONE, 2012, 7 (08):
[10] Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads
Ye, Chengxi
Ma, Zhanshan
PEERJ, 2016, 4

← 1 2 3 4 5 →