Efficient similarity search for tree-structured data

被引:0
|
作者
Li, Guoliang [1 ]
Liu, Xuhui [1 ]
Feng, Jianhua [1 ]
Zhou, Lizhu [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. Although similarity search on textual data has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees, especially for large numbers of tress. In this paper, we propose to transform tree-structured data into strings with a one-to-one mapping. We prove that the edit distance of the corresponding strings forms a bound for the similarity measures between trees, including tree edit distance, largest common subtrees and smallest common super-trees. Based on the theoretical analysis, we can employ any existing algorithm of approximate string search for effective similarity search on trees. Moreover, we embed the bound into a filter-and-refine framework for facilitating similarity search on tree-structured data. The experimental results show that our algorithm achieves high performance and outperforms state-of-the-art methods significantly. Our method is especially suitable for accelerating similarity query processing on large numbers of trees in massive datasets.
引用
收藏
页码:131 / 149
页数:19
相关论文
共 50 条
  • [1] Substructure search with tree-structured data
    Ozawa, K
    Yasuda, T
    Fujita, S
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (04): : 688 - 695
  • [2] Scaling Similarity Joins over Tree-Structured Data
    Tang, Yu
    Cai, Yilun
    Mamoulis, Nikos
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (11): : 1130 - 1141
  • [3] Bloom Filter Tree for Fast Search of Tree-Structured Data
    Wang, Mengyu
    Zhu, Ying
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2015, : 18 - 23
  • [4] Efficient change detection in tree-structured data
    Kim, DA
    Lee, SK
    [J]. WEB AND COMMUNICATION TECHNOLOGIES AND INTERNET-RELATED SOCIAL ISSUES - HSI 2003, 2003, 2713 : 675 - 681
  • [5] Bloom Filter Tree for Fast Search and Synchronization of Tree-Structured Data
    Wang, Mengyu
    Zhu, Ying
    [J]. PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 401 - 410
  • [6] Parallel Search Processing of Tree-Structured Data in a Big Data Environment
    Li, Lingxiao
    Taniar, David
    Indrawan-Santiago, Maria
    [J]. 2017 IEEE 31ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2017, : 379 - 386
  • [7] Clustering of Tree-structured Data
    Lu, Na
    Wu, Yidan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1210 - 1215
  • [8] Tree-Structured Vector Quantization for Similarity Queries
    Wu, Hanwei
    Wang, Qiwen
    Flierl, Markus
    [J]. 2017 DATA COMPRESSION CONFERENCE (DCC), 2017, : 467 - 467
  • [9] Efficient similarity search in large databases of tree structured objects
    Kailing, K
    Kriegel, HP
    Schönauer, S
    Seidl, T
    [J]. 20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 835 - 835
  • [10] Branch Filtering of Tree-Structured Search Results
    Aalberg, Trond
    [J]. 2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 341 - 342