Efficient Top-k Approximate Subtree Matching in Small Memory

被引:10
|
作者
Augsten, Nikolaus [1 ]
Barbosa, Denilson [2 ]
Boehlen, Michael M. [3 ]
Palpanas, Themis [4 ]
机构
[1] Free Univ Bozen Bolzano, Fac Comp Sci, I-39100 Bozen Bolzano, Italy
[2] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
[3] Univ Zurich, Dept Informat, CH-8050 Zurich, Switzerland
[4] Univ Trento, Dept Informat Engn & Comp Sci, I-38050 Povo, Italy
基金
加拿大自然科学与工程研究理事会;
关键词
Approximate subtree matching; tree edit distance; top-k queries; XML; subtree pruning; similarity search; ALGORITHMS;
D O I
10.1109/TKDE.2010.245
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the Top-k Approximate Subtree Matching (TASM) problem: finding the k best matches of a small query tree within a large document tree using the canonical tree edit distance as a similarity measure between subtrees. Evaluating the tree edit distance for large XML trees is difficult: the best known algorithms have cubic runtime and quadratic space complexity, and, thus, do not scale. Our solution is TASM-postorder, a memory-efficient and scalable TASM algorithm. We prove an upper bound for the maximum subtree size for which the tree edit distance needs to be evaluated. The upper bound depends on the query and is independent of the document size and structure. A core problem is to efficiently prune subtrees that are above this size threshold. We develop an algorithm based on the prefix ring buffer that allows us to prune all subtrees above the threshold in a single postorder scan of the document. The size of the prefix ring buffer is linear in the threshold. As a result, the space complexity of TASM-postorder depends only on k and the query size, and the runtime of TASM-postorder is linear in the size of the document. Our experimental evaluation on large synthetic and real XML documents confirms our analytic results.
引用
收藏
页码:1123 / 1137
页数:15
相关论文
共 50 条
  • [1] TASM: Top-k Approximate Subtree Matching
    Augsten, Nikolaus
    Barbosa, Denilson
    Boehlen, Michael
    Palpanas, Themis
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 353 - 364
  • [2] Fast Algorithms for Top-k Approximate String Matching
    Yang, Zhenglu
    Yu, Jianjun
    Kitsuregawa, Masaru
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1467 - 1473
  • [3] Optimal Enumeration: Efficient Top-k Tree Matching
    Chang, Lijun
    Lin, Xuemin
    Zhang, Wenjie
    Yu, Jeffrey Xu
    Zhang, Ying
    Qin, Lu
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (05): : 533 - 544
  • [4] A Scalable Index for Top-k Subtree Similarity Queries
    Kocher, Daniel
    Augsten, Nikolaus
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1624 - 1641
  • [5] Efficient Compressed Indexing for Approximate Top-k String Retrieval
    Ferrada, Hector
    Navarro, Gonzalo
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 18 - 30
  • [6] APPROXIMATE CONSISTENT WEIGHTED SAMPLING FOR EFFICIENT TOP-K SEARCH
    Kim, Yunna
    Hwang, Heasoo
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2020, 16 (03): : 1125 - 1132
  • [7] Efficient In-Memory Top-k Document Retrieval
    Culpepper, J. Shane
    Petri, Matthias
    Scholer, Falk
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 225 - 234
  • [8] An efficient method for top-k graph based node matching
    Guanfeng Liu
    Qun Shi
    Kai Zheng
    An Liu
    Zhixu Li
    Xiaofang Zhou
    [J]. World Wide Web, 2019, 22 : 945 - 966
  • [9] Efficient Top-k Matching for Publish/Subscribe Ride Hitching
    Li, Yafei
    Gu, Hongyan
    Chen, Rui
    Xu, Jianliang
    Guo, Shangwei
    Xue, Junxiao
    Xu, Mingliang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3808 - 3821
  • [10] An efficient method for top-k graph based node matching
    Liu, Guanfeng
    Shi, Qun
    Zheng, Kai
    Liu, An
    Li, Zhixu
    Zhou, Xiaofang
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (03): : 945 - 966