Approximate all-pairs suffix/prefix overlaps

被引:10
|
作者
Valimaki, Niko [1 ]
Ladra, Susana [2 ]
Makinen, Veli [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, HIIT, FIN-00014 Helsinki, Finland
[2] Univ A Coruna, Dept Comp Sci, La Coruna, Spain
基金
芬兰科学院; 欧洲研究理事会;
关键词
Suffix/prefix matching; Approximate pattern matching; ALGORITHMS; ALIGNMENT; GENOME; ULTRAFAST; TOOL;
D O I
10.1016/j.ic.2012.02.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate epsilon, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = inverted right perpendicular epsilon linverted left perpendicular, where e is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Karkkainen and Na, 2008). Our technique uses nH(k) + o(n log sigma) + r log r bits of space, where H-k is the k-th order entropy and sigma the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales up to millions of DNA reads. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [1] Approximate All-Pairs Suffix/Prefix Overlaps
    Valimaki, Niko
    Ladra, Susana
    Makinen, Veli
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2010, 6129 : 76 - +
  • [2] Efficient algorithms for the all-pairs suffix-prefix problem and the all-pairs substring-prefix problem
    Ohlebusch, Enno
    Gog, Simon
    INFORMATION PROCESSING LETTERS, 2010, 110 (03) : 123 - 128
  • [3] A fast algorithm for the all-pairs suffix-prefix problem
    Lim, Jihyuk
    Park, Kunsoo
    THEORETICAL COMPUTER SCIENCE, 2017, 698 : 14 - 24
  • [4] Parallel Computation for the All-Pairs Suffix-Prefix Problem
    Louza, Felipe A.
    Gog, Simon
    Zanotto, Leandro
    Araujo, Guido
    Telles, Guilherme P.
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2016, 2016, 9954 : 122 - 132
  • [5] An improved algorithm for the all-pairs suffix-prefix problem
    Tustumi, William H. A.
    Gog, Simon
    Telles, Guilherme P.
    Louza, Felipe A.
    JOURNAL OF DISCRETE ALGORITHMS, 2016, 37 : 34 - 43
  • [6] Latest Advances in Solving the All-Pairs Suffix Prefix Problem
    Rachid, Maan Haj
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3 (BIOINFORMATICS), 2019, : 174 - 181
  • [7] Using the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
    Rachid, Maan Haj
    Malluhi, Qutaibah
    Abouelhoda, Andmohamed
    BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [8] All-pairs suffix/prefix in optimal time using Aho-Corasick space
    Loukides, Grigorios
    Pissis, Solon P.
    INFORMATION PROCESSING LETTERS, 2022, 178
  • [9] AN EFFICIENT ALGORITHM FOR THE ALL PAIRS SUFFIX PREFIX PROBLEM
    GUSFIELD, D
    LANDAU, GM
    SCHIEBER, B
    INFORMATION PROCESSING LETTERS, 1992, 41 (04) : 181 - 185
  • [10] Dynamic approximate all-pairs shortest paths in undirected graphs
    Roditty, L
    Zwick, U
    45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, : 499 - 508