Tree edit distance: Robust and memory-efficient

被引:100
|
作者
Pawlik, Mateusz [1 ]
Augsten, Nikolaus [1 ]
机构
[1] Salzburg Univ, Dept Comp Sci, A-5020 Salzburg, Austria
关键词
Tree edit distance; Similarity search; Approximate matching; ALGORITHMS; DOCUMENTS;
D O I
10.1016/j.is.2015.08.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar trees. The standard approach to tree similarity is the tree edit distance, which has successfully been applied in a wide range of applications. In terms of runtime, the state-of-the-art algorithm for the tree edit distance is RTED, which is guaranteed to be fast independent of the tree shape. Unfortunately, this algorithm requires up to twice the memory of its competitors. The memory is quadratic in the tree size and is a bottleneck for the tree edit distance computation. In this paper we present a new, memory efficient algorithm for the tree edit distance, AP-TED (All Path Tree Edit Distance). Our algorithm runs at least as fast as RTED without trading in memory efficiency. This is achieved by releasing memory early during the first step of the algorithm, which computes a decomposition strategy for the actual distance computation. We show the correctness of our approach and prove an upper bound for the memory usage. The strategy computed by AP-TED is optimal in the class of all-path strategies, which subsumes the class of LRH strategies used in RTED. We further present the AP-TED+ algorithm, which requires less computational effort for very small subtrees and improves the runtime of the distance computation. Our experimental evaluation confirms the low memory requirements and the runtime efficiency of our approach. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:157 / 173
页数:17
相关论文
共 50 条
  • [31] Efficient edit distance with duplications and contractions
    Tamar Pinhas
    Shay Zakov
    Dekel Tsur
    Michal Ziv-Ukelson
    [J]. Algorithms for Molecular Biology, 8
  • [32] Analysis of tree edit distance on XML data
    Wu, Yu-Fang
    Lin, Shu-Fen
    Yen, Hsu-Chun
    [J]. PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON COMMUNICATIONS, INTERNET, AND INFORMATION TECHNOLOGY, 2007, : 5 - 10
  • [33] Learning probabilistic models of tree edit distance
    Bernard, Marc
    Boyer, Laurent
    Habrard, Amaury
    Sebban, Marc
    [J]. PATTERN RECOGNITION, 2008, 41 (08) : 2611 - 2629
  • [34] Tree sketch: An accurate and memory-efficient sketch for network-wide measurement
    Liu, Lei
    Ding, Tong
    Feng, Hui
    Yan, Zhongmin
    Lu, Xudong
    [J]. COMPUTER COMMUNICATIONS, 2022, 194 : 148 - 155
  • [35] Efficient edit distance with duplications and contractions
    Pinhas, Tamar
    Zakov, Shay
    Tsur, Dekel
    Ziv-Ukelson, Michal
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2013, 8
  • [36] Analyzing Edit Distance on Trees: Tree Swap Distance is Intractable
    Berglund, Martin
    [J]. PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2011, 2011, : 59 - 73
  • [37] Graph Similarity Using Tree Edit Distance
    Dwivedi, Shri Prakash
    Srivastava, Vishal
    Gupta, Umesh
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 233 - 241
  • [38] Faster algorithms for guided tree edit distance
    Tsur, Dekel
    [J]. INFORMATION PROCESSING LETTERS, 2008, 108 (04) : 251 - 254
  • [39] Memory-efficient interconnect optimization
    Lai, MH
    Wong, DF
    [J]. PROCEEDINGS OF THE ASP-DAC 2001: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2001, 2001, : 198 - 202
  • [40] Fast and Memory-Efficient Approximate Minimum Spanning Tree Generation for Large Datasets
    Almansoori, Mahmood K. M.
    Meszaros, Andras
    Telek, Miklos
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,