Approximate all-pairs suffix/prefix overlaps

被引:10
|
作者
Valimaki, Niko [1 ]
Ladra, Susana [2 ]
Makinen, Veli [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, HIIT, FIN-00014 Helsinki, Finland
[2] Univ A Coruna, Dept Comp Sci, La Coruna, Spain
基金
芬兰科学院; 欧洲研究理事会;
关键词
Suffix/prefix matching; Approximate pattern matching; ALGORITHMS; ALIGNMENT; GENOME; ULTRAFAST; TOOL;
D O I
10.1016/j.ic.2012.02.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate epsilon, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = inverted right perpendicular epsilon linverted left perpendicular, where e is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Karkkainen and Na, 2008). Our technique uses nH(k) + o(n log sigma) + r log r bits of space, where H-k is the k-th order entropy and sigma the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales up to millions of DNA reads. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [21] Aspectual pairs: Prefix vs. suffix way of formation
    Solovyev, Valery
    Bochkarev, Vladimir
    Bayrasheva, Venera
    RUSSIAN JOURNAL OF LINGUISTICS, 2022, 26 (04): : 1114 - 1135
  • [22] Nonparametric all-pairs multiple comparisons
    Neuhäuser, M
    Bretz, F
    BIOMETRICAL JOURNAL, 2001, 43 (05) : 571 - 580
  • [23] The "All-Pairs closest points" problem
    Mahoney, WR
    DR DOBBS JOURNAL, 2003, 28 (01): : 48 - +
  • [24] Improved Filters for the Approximate Suffix-Prefix Overlap Problem
    Kucherov, Gregory
    Tsur, Dekel
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 139 - 148
  • [25] THE ALL-PAIRS QUICKEST PATH PROBLEM
    LEE, DT
    PAPADOPOULOU, E
    INFORMATION PROCESSING LETTERS, 1993, 45 (05) : 261 - 267
  • [26] External-Memory Exact and Approximate All-Pairs Shortest-Paths in Undirected Graphs
    Chowdhury, Rezaul Alam
    Ramachandran, Vijaya
    PROCEEDINGS OF THE SIXTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2005, : 735 - 744
  • [27] Reliable All-Pairs Evolving Fuzzy Classifiers
    Lughofer, Edwin
    Buchtala, Oliver
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2013, 21 (04) : 625 - 641
  • [28] ALL-PAIRS SHORTEST PATHS AND THE ESSENTIAL SUBGRAPH
    MCGEOCH, CC
    ALGORITHMICA, 1995, 13 (05) : 426 - 441
  • [29] All-pairs small-stretch paths
    Cohen, E
    Zwick, U
    JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2001, 38 (02): : 335 - 353
  • [30] A fast algorithm for all-pairs Hamming distances
    Arslan, Abdullah N.
    INFORMATION PROCESSING LETTERS, 2018, 139 : 49 - 52