Finding maximal exact matches in graphs

被引:0
|
作者
Rizzo, Nicola [1 ]
Caceres, Manuel [1 ]
Makinen, Veli [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Pietari Kalmin katu 5,POB 68, Helsinki 00014, Finland
基金
欧盟地平线“2020”;
关键词
Sequence to graph alignment; Bidirectional BWT; r-index; Suffix tree; Founder graphs; SEARCH; CONSTRUCTION; RETRIEVAL; SEQUENCE; TREE;
D O I
10.1186/s13015-024-00255-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundWe study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least kappa\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa$$\end{document} (kappa\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa$$\end{document}-MEMs). However, on arbitrary input graphs, the problem of finding MEMs cannot be solved in truly sub-quadratic time under SETH (Equi et al., TALG 2023) even on acyclic graphs.ResultsIn this paper we show an O(n center dot L center dot dL-1+m+M kappa,L)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n\cdot L \cdot d<^>{L-1} + m + M_{\kappa ,L})$$\end{document}-time algorithm finding all kappa\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa$$\end{document}-MEMs between Q and G spanning exactly L nodes in G, where n is the total length of node labels, d is the maximum degree of a node in G, m=|Q|\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$m = |Q|$$\end{document}, and M kappa,L\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{\kappa ,L}$$\end{document} is the number of output MEMs. We use this algorithm to develop a kappa\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa$$\end{document}-MEM finding solution on indexable Elastic Founder Graphs (Equi et al. , Algorithmica 2022) running in time O(nH2+m+M kappa)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(nH<^>2 + m + M_\kappa )$$\end{document}, where H is the maximum number of nodes in a block, and M kappa\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_\kappa$$\end{document} is the total number of kappa\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa$$\end{document}-MEMs. Our results generalize to the analysis of multiple query strings (MEMs between G and any of the strings). Additionally, we provide some experimental results showing that the number of graph MEMs is an order of magnitude smaller than the number of string MEMs of the corresponding concatenated collection.ConclusionsWe show that seed-chain-extend type of alignment methods can be implemented on top of indexable Elastic Founder Graphs by providing an efficient way to produce the seeds between a set of queries and the graph. The code is available in https://github.com/algbio/efg-mems.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] slaMEM: efficient retrieval of maximal exact matches using a sampled LCP array
    Fernandes, Francisco
    Freitas, Ana T.
    BIOINFORMATICS, 2014, 30 (04) : 464 - 471
  • [22] On Effectively Finding Maximal Quasi-cliques in Graphs
    Brunato, Mauro
    Hoos, Holger H.
    Battiti, Roberto
    LEARNING AND INTELLIGENT OPTIMIZATION, 2008, 5313 : 41 - +
  • [23] Jabba: Hybrid Error Correction for Long Sequencing Reads Using Maximal Exact Matches
    Miclotte, Giles
    Heydari, Mahdi
    Demeester, Piet
    Audenaert, Pieter
    Fostier, Jan
    ALGORITHMS IN BIOINFORMATICS (WABI 2015), 2015, 9289 : 175 - 188
  • [24] Computing Matching Statistics and Maximal Exact Matches on Compressed Full-Text Indexes
    Ohlebusch, Enno
    Gog, Simon
    Kuegel, Adrian
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2010, 6393 : 347 - 358
  • [25] E-MEM: efficient computation of maximal exact matches for very large genomes
    Khiste, Nilesh
    Ilie, Lucian
    BIOINFORMATICS, 2015, 31 (04) : 509 - 514
  • [26] Exact algorithms for finding the minimum independent dominating set in graphs
    Liu, Chunmei
    Song, Yinglei
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2006, 4288 : 439 - +
  • [27] THE EXACT MAXIMAL ENERGY OF INTEGRAL CIRCULANT GRAPHS WITH PRIME POWER ORDER
    Sander, J. W.
    Sander, T.
    CONTRIBUTIONS TO DISCRETE MATHEMATICS, 2013, 8 (02) : 19 - 40
  • [28] AN O(EVLOGV) ALGORITHM FOR FINDING A MAXIMAL WEIGHTED MATCHING IN GENERAL GRAPHS
    GALIL, Z
    MICALI, S
    GABOW, H
    SIAM JOURNAL ON COMPUTING, 1986, 15 (01) : 120 - 130
  • [29] Space-time trade-offs for finding shortest unique substrings and maximal unique matches
    Ganguly, Arnab
    Hon, Wing-Kai
    Shah, Rahul
    Thankachan, Sharma V.
    THEORETICAL COMPUTER SCIENCE, 2017, 700 : 75 - 88
  • [30] Exact Algorithms for Finding Longest Cycles in Claw-Free Graphs
    Broersma, Hajo
    Fomin, Fedor V.
    van 't Hof, Pim
    Paulusma, Daniel
    ALGORITHMICA, 2013, 65 (01) : 129 - 145