Approximate all-pairs suffix/prefix overlaps

被引:10
|
作者
Valimaki, Niko [1 ]
Ladra, Susana [2 ]
Makinen, Veli [1 ]
机构
[1] Univ Helsinki, Dept Comp Sci, HIIT, FIN-00014 Helsinki, Finland
[2] Univ A Coruna, Dept Comp Sci, La Coruna, Spain
基金
芬兰科学院; 欧洲研究理事会;
关键词
Suffix/prefix matching; Approximate pattern matching; ALGORITHMS; ALIGNMENT; GENOME; ULTRAFAST; TOOL;
D O I
10.1016/j.ic.2012.02.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Finding approximate overlaps is the first phase of many sequence assembly methods. Given a set of strings of total length n and an error-rate epsilon, the goal is to find, for all-pairs of strings, their suffix/prefix matches (overlaps) that are within edit distance k = inverted right perpendicular epsilon linverted left perpendicular, where e is the length of the overlap. We propose a new solution for this problem based on backward backtracking (Lam, et al., 2008) and suffix filters (Karkkainen and Na, 2008). Our technique uses nH(k) + o(n log sigma) + r log r bits of space, where H-k is the k-th order entropy and sigma the alphabet size. In practice, it is more scalable in terms of space, and comparable in terms of time, than q-gram filters (Rasmussen, et al., 2006). Our method is also easy to parallelize and scales up to millions of DNA reads. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [31] Efficient Maintenance of All-Pairs Shortest Distances
    Greco, Sergio
    Molinaro, Cristian
    Pulice, Chiara
    28TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM) 2016), 2016,
  • [32] Algorithms for maintaining all-pairs shortest paths
    Misra, S
    Oommen, BJ
    10TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 2005, : 116 - 121
  • [33] All-pairs ancestor problems in weighted dags
    Baumgart, Matthias
    Eckhardt, Stefan
    Griebsch, Jan
    Kosub, Sven
    Nowak, Johannes
    COMBINATORICS, ALGORITHMS, PROBABILISTIC AND EXPERIMENTAL METHODOLOGIES, 2007, 4614 : 282 - +
  • [34] Generalizing the all-pairs min cut problem
    Hartvigsen, D
    DISCRETE MATHEMATICS, 1995, 147 (1-3) : 151 - 169
  • [35] All-pairs small-stretch paths
    Cohen, E
    Zwick, U
    PROCEEDINGS OF THE EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 1997, : 93 - 102
  • [36] Generalizing the all-pairs min cut problem
    Hartvigsen, D.
    1600, Elsevier Science B.V., Amsterdam, Netherlands (147): : 1 - 3
  • [37] COMPUTING THE ALL-PAIRS LONGEST CHAINS IN THE PLANE
    ATALLAH, MJ
    CHEN, DZ
    INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 1995, 5 (03) : 257 - 271
  • [38] A Supernodal All-Pairs Shortest Path Algorithm
    Sao, Piyush
    Kannan, Ramakrishnan
    Gera, Prasun
    Vuduc, Richard
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 250 - 261
  • [39] Fuzzy all-pairs shortest paths problem
    Seda, Milos
    COMPUTATIONAL INTELLIGENCE, THEORY AND APPLICATION, 2006, : 395 - 404
  • [40] Minimizing communication in all-pairs shortest paths
    Solomonik, Edgar
    Buluc, Aydin
    Demmel, James
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 548 - 559