Longest Common Substring with Approximately k Mismatches

被引:9
|
作者
Kociumaka, Tomasz [1 ]
Radoszewski, Jakub [1 ]
Starikovskaya, Tatiana [2 ]
机构
[1] Univ Warsaw, Inst Informat, Warsaw, Poland
[2] PSL Univ, Ecole Normale Super, DIENS, Paris, France
关键词
Randomised algorithms; String similarity measures; Longest common substring; Sketching; Locality-sensitive hashing; Binary jumbled indexing;
D O I
10.1007/s00453-019-00548-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In the longest common substring problem, we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one character. To circumvent this, Leimeister and Morgenstern introduced the problem of the longest common substring with k mismatches. Lately, this problem has received a lot of attention in the literature. In this paper, we first show a conditional lower bound based on the SETH hypothesis implying that there is little hope to improve existing solutions. We then introduce a new but closely related problem of the longest common substring with approximately k mismatches and use locality-sensitive hashing to show that it admits a solution with strongly subquadratic running time. We also apply these results to obtain a strongly subquadratic-time 2-approximation algorithm for the longest common substring with k mismatches problem and show conditional hardness of improving its approximation ratio.
引用
收藏
页码:2633 / 2652
页数:20
相关论文
共 50 条
  • [21] Linear Time Algorithms for Generalizations of the Longest Common Substring Problem
    Arnold, Michael
    Ohlebusch, Enno
    ALGORITHMICA, 2011, 60 (04) : 806 - 818
  • [22] On the shortest distance between orbits and the longest common substring problem
    Barros, Vanessa
    Liao, Lingmin
    Rousseau, Jerome
    ADVANCES IN MATHEMATICS, 2019, 344 : 311 - 339
  • [23] Efficient watermark detection by using the longest common substring technique
    Mohamed, Taha M.
    Elmahdy, Hesham N.
    Onsi, Hoda M.
    EGYPTIAN INFORMATICS JOURNAL, 2011, 12 (02) : 115 - 123
  • [24] A practical algorithm to find longest common substring in linear time
    Hui, LCK
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2000, 15 (02): : 73 - 76
  • [25] The Extended Longest Common Substring Algorithm for Spoken Document Retrieval
    Prozorov, Dmitriy
    Yashina, Alexandra
    2015 9TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2015, : 88 - 90
  • [26] Linear Time Algorithms for Generalizations of the Longest Common Substring Problem
    Michael Arnold
    Enno Ohlebusch
    Algorithmica, 2011, 60 : 806 - 818
  • [27] Efficient algorithms for the longest common subsequence problem with sequential substring constraints
    Tseng, Chiou-Ting
    Yang, Chang-Biau
    Ann, Hsing-Yen
    JOURNAL OF COMPLEXITY, 2013, 29 (01) : 44 - 52
  • [28] Linear trace similarity matching based on improved longest common substring
    Zhao, Chengjun
    Pan, Nan
    Jiang, Xuemei
    Pan, Dilin
    Liu, Yi
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (04) : 7849 - 7855
  • [29] Using the Longest Common Substring on Dynamic Traces of Malware to Automatically Identify Common Behaviors
    Acosta, Jaime
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INFORMATION WARFARE AND SECURITY, 2011, : 1 - 7
  • [30] Computing longest common substring and all palindromes from compressed strings
    Matsubara, Wataru
    Inenaga, Shunsuke
    Ishino, Akira
    Shinohara, Ayumi
    Nakamura, Tomoyuki
    Hashimoto, Kazuo
    SOFSEM 2008: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2008, 4910 : 364 - +