An Efficient Algorithm for Finding All Pairs k-Mismatch Maximal Common Substrings

被引：2

作者：

Thankachan, Sharma V. ^{[1
]}

Chockalingam, Sriram P. ^{[2
]}

Aluru, Srinivas ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch CSE, Atlanta, GA 30332 USA

[2] Indian Inst Technol, Dept CSE, Bombay, Maharashtra, India

来源：

BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2016 | 2016年 / 9683卷

关键词：

LINEAR-TIME CONSTRUCTION; SUFFIX-ARRAYS;

D O I：

10.1007/978-3-319-38782-6_1

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Identifying long pairwise maximal common substrings among a large set of sequences is a frequently used construct in computational biology, with applications in DNA sequence clustering and assembly. Due to errors made by sequencers, algorithms that can accommodate a small number of differences are of particular interest, but obtaining provably efficient solutions for such problems has been elusive. In this paper, we present a provably efficient algorithm with an expected run time guarantee of O(N log(k) N + occ), where occ is the output size, for the following problem: Given a collection D = {S-1, S-2, ..., S-n} of n sequences of total length N, a length threshold (sic) and a mismatch threshold k >= 0, report all k-mismatch maximal common substrings of length at least (sic) over all pairs of sequences in D. In addition, we present a result showing the hardness of this problem.

引用

页码：3 / 14

页数：12

共 50 条

[21] Quantum Speed-Ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch Matching
Jin, Ce
Nogler, Jakob
ACM TRANSACTIONS ON ALGORITHMS, 2024, 20 (04)
[22] An efficient algorithm for finding similar short substrings from large scale string data
Uno, Takeaki
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 345 - 356
[23] Quantum Speed-ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch Matching
Jin, Ce
Nogler, Jakob
PROCEEDINGS OF THE 2023 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2023, : 5090 - 5121
[24] A Fast Algorithm for Finding a Maximal Common Subsequence of Multiple Strings
Hirota, Miyuji
Sakai, Yoshifumi
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2023, E106A (09) : 1191 - 1194
[25] An algorithm for finding maximal common subtopologies in a set of protein structures
Koch, I
Lengauer, T
Wanke, E
JOURNAL OF COMPUTATIONAL BIOLOGY, 1996, 3 (02) : 289 - 306
[26] SAT-based algorithm for finding all maximal cliques
Wu, Haitao
Hao, Ningbo
Chou, Wen-Kuang
INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2016, 12 (2-3) : 186 - 191
[27] A DISTRIBUTED ALGORITHM FOR FINDING ALL MAXIMAL CLIQUES IN A NETWORK GRAPH
JENNINGS, E
MOTYCKOVA, L
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 583 : 281 - 293
[28] THE ALGORITHM OF FINDING ALL PARADOXICAL PAIRS IN A LINEAR TRANSPORTATION PROBLEM
Basu, Manjusri
Acharya, Debiprasad
Das, Atanu
DISCRETE MATHEMATICS ALGORITHMS AND APPLICATIONS, 2012, 4 (04)
[29] A coarse-grained parallel algorithm for the all-substrings longest common subsequence problem
Alves, Carlos E. R.
Caceres, Edson N.
Song, Siang Wun
ALGORITHMICA, 2006, 45 (03) : 301 - 335
[30] An efficient algorithm for finding all frequent itemsets
Hang, Jian-Min
Chen, Fu-Zan
Zhang, Qin
PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1092 - +

← 1 2 3 4 5 →