Tree edit distance: Robust and memory-efficient

被引:100
|
作者
Pawlik, Mateusz [1 ]
Augsten, Nikolaus [1 ]
机构
[1] Salzburg Univ, Dept Comp Sci, A-5020 Salzburg, Austria
关键词
Tree edit distance; Similarity search; Approximate matching; ALGORITHMS; DOCUMENTS;
D O I
10.1016/j.is.2015.08.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar trees. The standard approach to tree similarity is the tree edit distance, which has successfully been applied in a wide range of applications. In terms of runtime, the state-of-the-art algorithm for the tree edit distance is RTED, which is guaranteed to be fast independent of the tree shape. Unfortunately, this algorithm requires up to twice the memory of its competitors. The memory is quadratic in the tree size and is a bottleneck for the tree edit distance computation. In this paper we present a new, memory efficient algorithm for the tree edit distance, AP-TED (All Path Tree Edit Distance). Our algorithm runs at least as fast as RTED without trading in memory efficiency. This is achieved by releasing memory early during the first step of the algorithm, which computes a decomposition strategy for the actual distance computation. We show the correctness of our approach and prove an upper bound for the memory usage. The strategy computed by AP-TED is optimal in the class of all-path strategies, which subsumes the class of LRH strategies used in RTED. We further present the AP-TED+ algorithm, which requires less computational effort for very small subtrees and improves the runtime of the distance computation. Our experimental evaluation confirms the low memory requirements and the runtime efficiency of our approach. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:157 / 173
页数:17
相关论文
共 50 条
  • [21] Memory-Efficient RkNN Retrieval by Nonlinear k-Distance Approximation
    Obermeier, Sandra
    Berrendorf, Max
    Kroger, Peer
    [J]. 2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 387 - 390
  • [22] A Fast, Memory-Efficient Alpha-Tree Algorithm Using Flooding and Tree Size Estimation
    You, Jiwoo
    Trager, Scott C.
    Wilkinson, Michael H. F.
    [J]. MATHEMATICAL MORPHOLOGY AND ITS APPLICATIONS TO SIGNAL AND IMAGE PROCESSING, ISMM 2019, 2019, 11564 : 256 - 267
  • [23] An Optimal Decomposition Algorithm for Tree Edit Distance
    Demaine, Erik D.
    Mozes, Shay
    Rossman, Benjamin
    Weimann, Oren
    [J]. ACM TRANSACTIONS ON ALGORITHMS, 2009, 6 (01)
  • [24] Graph Edit Distance Compacted Search Tree
    Chegrane, Ibrahim
    Hocine, Imane
    Yahiaoui, Said
    Bendjoudi, Ahcene
    Nouali-Taboudjemat, Nadia
    [J]. SIMILARITY SEARCH AND APPLICATIONS (SISAP 2022), 2022, 13590 : 181 - 189
  • [25] Decomposition algorithms for the tree edit distance problem
    Dulucq, Serge
    Touzet, Helene
    [J]. JOURNAL OF DISCRETE ALGORITHMS, 2005, 3 (2-4) : 448 - 471
  • [26] Analyzing edit distance on trees: Tree swap distance is intractable
    Department of Computing Science, Umeå University, 90187 Umeå, Sweden
    [J]. Proc. Prag. Str. Conf., (59-73):
  • [27] Tree Edit Distance and Maximum Agreement Subtree
    Shin, Kilho
    [J]. INFORMATION PROCESSING LETTERS, 2015, 115 (01) : 69 - 73
  • [28] A survey on tree edit distance and related problems
    Bille, P
    [J]. THEORETICAL COMPUTER SCIENCE, 2005, 337 (1-3) : 217 - 239
  • [29] An optimal decomposition algorithm for tree edit distance
    Demaine, Erik D.
    Mozes, Shay
    Rossman, Benjamin
    Weimann, Oren
    [J]. AUTOMATA, LANGUAGES AND PROGRAMMING, PROCEEDINGS, 2007, 4596 : 146 - +
  • [30] Tree edit distance from information theory
    Torsello, A
    Hancock, ER
    [J]. GRAPH BASED REPRESENTATIONS IN PATTERN RECOGNITION, PROCEEDINGS, 2003, 2726 : 71 - 82