Streaming Algorithms for Embedding and Computing Edit Distance in the Low Distance Regime

被引:35
|
作者
Chakraborty, Diptarka [1 ]
Goldenberg, Elazar [2 ]
Koucky, Michal [2 ]
机构
[1] Indian Inst Technol Kanpur, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India
[2] Charles Univ Prague, Inst Comp Sci, Malostranske Namesti 25, Prague 11800 1, Czech Republic
基金
欧洲研究理事会;
关键词
Edit distance; Hamming distance; randomized embedding; low distortion; string; kernel;
D O I
10.1145/2897518.2897577
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x, y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y. In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. We show a randomized embedding with quadratic distortion. Namely, for any x, y satisfying that their edit distance equals k, the Hamming distance between the embedding of x and y is O(k(2)) with high probability. This improves over the distortion ratio of O(log n log* n) obtained by Jowhari (2012) for small values of k. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input. We provide several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space O(s) and computing edit distance exactly up-to distance s(1/6). This algorithm is based on kernelization for edit distance that is of independent interest.
引用
收藏
页码:712 / 725
页数:14
相关论文
共 50 条
  • [1] Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance
    Kociumaka, Tomasz
    Saha, Barna
    2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020), 2020, : 1168 - 1179
  • [2] Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance
    Saks, Michael
    Seshadhri, C.
    PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), 2013, : 1698 - 1709
  • [3] Convolutional Embedding for Edit Distance
    Dai, Xinyan
    Yan, Xiao
    Zhou, Kaiwen
    Wang, Yuxuan
    Yang, Han
    Cheng, James
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 599 - 608
  • [4] Embedding Edit Distance to Allow Private Keyword Search in Cloud Computing
    Bringer, Julien
    Chabanne, Herve
    SECURE AND TRUST COMPUTING, DATA MANAGEMENT, AND APPLICATIONS, 2011, 186 : 105 - 113
  • [5] Almost-Optimal Sublinear-Time Edit Distance in the Low Distance Regime
    Bringmann, Karl
    Cassis, Alejandro
    Fischer, Nick
    Nakos, Vasileios
    PROCEEDINGS OF THE 54TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '22), 2022, : 1102 - 1115
  • [6] Improved MPC Algorithms for Edit Distance and Ulam Distance
    Boroujeni, Mahdi
    Seddighin, Saeed
    SPAA'19: PROCEEDINGS OF THE 31ST ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURESS, 2019, 2019, : 31 - 40
  • [7] Improved MPC Algorithms for Edit Distance and Ulam Distance
    Boroujeni, Mahdi
    Ghodsi, Mohammad
    Seddighin, Saeed
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (11) : 2764 - 2776
  • [8] Exact algorithms for computing the tree edit distance between unordered trees
    Akutsu, Tatsuya
    Fukagawa, Daiji
    Takasu, Atsuhiro
    Tamura, Takeyuki
    THEORETICAL COMPUTER SCIENCE, 2011, 412 (4-5) : 352 - 364
  • [9] Edit Distance: Sketching, Streaming and Document Exchange
    Belazzougui, Djamal
    Zhang, Qin
    2016 IEEE 57TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2016, : 51 - 60
  • [10] How hard is computing the edit distance?
    Pighizzini, G
    INFORMATION AND COMPUTATION, 2001, 165 (01) : 1 - 13