Efficient Similarity Join and Search on Multi-Attribute Data

被引:17
|
作者
Li, Guoliang [1 ]
He, Jian [1 ]
Deng, Dong [1 ]
Li, Jian [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci, Beijing, Peoples R China
[2] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
关键词
Similarity Search; Similarity Join; Multi-Attribute Data; ALGORITHM;
D O I
10.1145/2723372.2723733
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we study similarity join and search on multi-attribute data. Traditional methods on single-attribute data have pruning power only on single attributes and cannot efficiently support multi-attribute data. To address this problem, we propose a prefix tree index which has holistic pruning ability on multiple attributes. We propose a cost model to quantify the prefix tree which can guide the prefix tree construction. Based on the prefix tree, we devise a filter-verification framework to support similarity search and join on multi-attribute data. The filter step prunes a large number of dissimilar results and identifies some candidates using the prefix tree and the verification step verifies the candidates to generate the final answer. For similarity join, we prove that constructing an optimal prefix tree is NP-complete and develop a greedy algorithm to achieve high performance. For similarity search, since one prefix tree cannot support all possible search queries, we extend the cost model to support similarity search and devise a budget-based algorithm to construct multiple high-quality prefix trees. We also devise a hybrid verification algorithm to improve the verification step. Experimental results show our method significantly outperforms baseline approaches.
引用
收藏
页码:1137 / 1151
页数:15
相关论文
共 50 条
  • [1] Efficient Multi-Attribute Similarity Learning Towards Attribute-based Fashion Search
    Ak, Kenan E.
    Lim, Joo Hwee
    Tham, Jo Yew
    Kassim, Ashraf A.
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1671 - 1679
  • [2] Similarity measure for multi-attribute data
    Li, CJ
    Prabhakaran, B
    Zheng, SQ
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1149 - 1152
  • [3] Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search
    Scheirer, Walter J.
    Kumar, Neeraj
    Belhumeur, Peter N.
    Boult, Terrance E.
    [J]. 2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 2933 - 2940
  • [4] Enhancing Multi-Attribute Similarity Join using Reduced and Adaptive Index Trees
    Silva, Vitor Bezerra
    Nascimento, Dimas Cassimiro
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (07) : 4251 - 4281
  • [5] Range Search over Encrypted Multi-Attribute Data
    Falzon, Francesca
    Markatou, Evangelia Anna
    Espiritu, Zachary
    Tamassia, Roberto
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 16 (04): : 587 - 600
  • [6] Multi-attribute sequential search
    Bearden, J. Neil
    Connolly, Terry
    [J]. ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES, 2007, 103 (01) : 147 - 158
  • [7] Similarity searching for multi-attribute sequences
    Kahveci, T
    Singh, A
    Gürel, A
    [J]. 14TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2002, : 175 - 184
  • [8] Efficient search techniques for multi-attribute bilateral negotiation strategies
    Tesauro, G
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE, PROCEEDINGS, 2002, : 30 - 36
  • [9] A framework for efficient multi-attribute movement data analysis
    Fabio Valdés
    Ralf Hartmut Güting
    [J]. The VLDB Journal, 2019, 28 : 427 - 449
  • [10] A framework for efficient multi-attribute movement data analysis
    Valdes, Fabio
    Gueting, Ralf Hartmut
    [J]. VLDB JOURNAL, 2019, 28 (04): : 427 - 449