Bounded Occurrence Edit Distance: A New Metric for String Similarity Joins with Edit Distance Constraints

被引:0
|
作者
Komatsu, Tomoki [1 ]
Okuta, Ryosuke [1 ]
Narisawa, Kazuyuki [1 ]
Shinohara, Ayumi [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 980, Japan
关键词
Edit distance; Similarity join problem; Similarity search; Data integration;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given two sets of strings and a similarity function on strings, similarity joins attempt to find all similar pairs of strings from each respective set. In this paper, we focus on similarity joins with respect to the edit distance, and propose a new metric called the bounded occurrence edit distance and a filter based on the metric. Using the filter, we can reduce the total time required to solve similarity joins because the metric can be computed faster than the edit distance by bitwise operations. We demonstrate the effectiveness of the filter through experiments.
引用
收藏
页码:363 / 374
页数:12
相关论文
共 50 条
  • [1] Efficient Graph Similarity Joins with Edit Distance Constraints
    Zhao, Xiang
    Xiao, Chuan
    Lin, Xuemin
    Wang, Wei
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 834 - 845
  • [2] A Partition-Based Method for String Similarity Joins with Edit-Distance Constraints
    Li, Guoliang
    Deng, Dong
    Feng, Jianhua
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2013, 38 (02):
  • [3] MinJoin++: a fast algorithm for string similarity joins under edit distance
    Nikolai Karpov
    Haoyu Zhang
    Qin Zhang
    [J]. The VLDB Journal, 2024, 33 : 281 - 299
  • [4] Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints
    Xiao, Chuan
    Wang, Wei
    Lin, Xuemin
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 933 - 944
  • [5] Explaining Propagators for String Edit Distance Constraints
    Winter, Felix
    Muslin, Nysret
    Stuckey, Peter J.
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1676 - 1683
  • [6] Approximating tree edit distance through string edit distance
    Akutsu, Tatsuya
    Fukagawa, Daiji
    Takasu, Atsuhiro
    [J]. ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2006, 4288 : 90 - +
  • [7] Approximating Tree Edit Distance through String Edit Distance
    Akutsu, Tatsuya
    Fukagawa, Daiji
    Takasu, Atsuhiro
    [J]. ALGORITHMICA, 2010, 57 (02) : 325 - 348
  • [8] Approximating Tree Edit Distance through String Edit Distance
    Tatsuya Akutsu
    Daiji Fukagawa
    Atsuhiro Takasu
    [J]. Algorithmica, 2010, 57 : 325 - 348
  • [9] A New String Edit Distance and Applications
    Petty, Taylor
    Hannig, Jan
    Huszar, Tunde, I
    Iyer, Hari
    [J]. ALGORITHMS, 2022, 15 (07)
  • [10] MinJoin plus plus : a fast algorithm for string similarity joins under edit distance
    Karpov, Nikolai
    Zhang, Haoyu
    Zhang, Qin
    [J]. VLDB JOURNAL, 2024, 33 (02): : 281 - 299