Bounded Occurrence Edit Distance: A New Metric for String Similarity Joins with Edit Distance Constraints

被引:0
|
作者
Komatsu, Tomoki [1 ]
Okuta, Ryosuke [1 ]
Narisawa, Kazuyuki [1 ]
Shinohara, Ayumi [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 980, Japan
关键词
Edit distance; Similarity join problem; Similarity search; Data integration;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given two sets of strings and a similarity function on strings, similarity joins attempt to find all similar pairs of strings from each respective set. In this paper, we focus on similarity joins with respect to the edit distance, and propose a new metric called the bounded occurrence edit distance and a filter based on the metric. Using the filter, we can reduce the total time required to solve similarity joins because the metric can be computed faster than the edit distance by bitwise operations. We demonstrate the effectiveness of the filter through experiments.
引用
下载
收藏
页码:363 / 374
页数:12
相关论文
共 50 条
  • [31] Large-Scale Similarity Join with Edit-Distance Constraints
    Lin, Chen
    Yu, Haiyang
    Weng, Wei
    He, Xianmang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 328 - 342
  • [32] Classes of cost functions for string edit distance
    S. V. Rice
    H. Bunke
    T. A. Nartker
    Algorithmica, 1997, 18 : 271 - 280
  • [33] Oblivious String Embeddings and Edit Distance Approximations
    Batu, Tugkan
    Ergun, Funda
    Sahinalp, Cenk
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 792 - 801
  • [34] The String Edit Distance Matching Problem With Moves
    Cormode, Graham
    Muthukrishnan, S.
    ACM TRANSACTIONS ON ALGORITHMS, 2007, 3 (01)
  • [35] The string edit distance matching problem with moves
    Cormode, G
    Muthukrishnan, S
    PROCEEDINGS OF THE THIRTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2002, : 667 - 676
  • [36] Classes of cost functions for string edit distance
    Rice, SV
    Bunke, H
    Nartker, TA
    ALGORITHMICA, 1997, 18 (02) : 271 - 280
  • [37] Optimal Algorithms for Bounded Weighted Edit Distance
    Cassis, Alejandro
    Kociumaka, Tomasz
    Wellnitz, Philip
    2023 IEEE 64TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, FOCS, 2023, : 2177 - 2187
  • [38] A new algorithm for image similarity measure and graph edit distance
    Xiao, Bing
    Li, Jie
    Gao, Xin-Bo
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2009, 37 (10): : 2205 - 2210
  • [39] Similarity of DTDs Based on Edit Distance and Semantics
    Wojnar, Ales
    Mlynkova, Irena
    Dokulil, Jiri
    INTELLIGENT DISTRIBUTED COMPUTING, SYSTEMS AND APPLICATIONS, 2008, 162 : 207 - 216
  • [40] Chemical Similarity Based on Map Edit Distance
    Li, Xin
    Lyu, Xiaoqing
    Tang, Zhi
    Zhang, Hao
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1220 - 1222