LES3: Learning-based Exact Set Similarity Search

被引:3
|
作者
Li, Yifan [1 ]
Yu, Xiaohui [1 ]
Koudas, Nick [2 ]
机构
[1] York Univ, Toronto, ON, Canada
[2] Univ Toronto, Toronto, ON, Canada
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2021年 / 14卷 / 11期
关键词
FRAMEWORK; JOINS;
D O I
10.14778/3476249.3476263
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Set similarity search is a problem of central interest to a wide variety of applications such as data cleaning and web search. Past approaches on set similarity search utilize either heavy indexing structures, incurring large search costs or indexes that produce large candidate sets. In this paper, we design a learning-based exact set similarity search approach, LES3. Our approach first partitions sets into groups, and then utilizes a light-weight bitmap-like indexing structure, called token-group matrix (TGM), to organize groups and prune out candidates given a query set. In order to optimize pruning using the TGM, we analytically investigate the optimal partitioning strategy under certain distributional assumptions. Using these results, we then design a learning-based partitioning approach called L2P and an associated data representation encoding, PTR, to identify the partitions. We conduct extensive experiments on real and synthetic datasets to fully study LES3, establishing the effectiveness and superiority over other applicable approaches.
引用
收藏
页码:2073 / 2086
页数:14
相关论文
共 50 条
  • [21] Subsets and Supermajorities: Optimal Hashing-based Set Similarity Search
    Ahle, Thomas D.
    Knudsen, Jakob B. T.
    2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020), 2020, : 728 - 739
  • [22] A Review of Deep Learning-Based Binary Code Similarity Analysis
    Du, Jiang
    Wei, Qiang
    Wang, Yisen
    Sun, Xiangjie
    ELECTRONICS, 2023, 12 (22)
  • [23] A MACHINE LEARNING-BASED SURROGATE MODEL FOR SIMILARITY CRITERION OF SOLIDIFICATION
    Huang, Xixi
    Xue, Xiang
    Wang, Mingjie
    Zhu, Jihu
    Dai, Guixin
    Wu, Shiping
    INTERNATIONAL JOURNAL OF METALCASTING, 2025, 19 (01) : 353 - 362
  • [24] Comparative Object Similarity Learning-Based Robust Visual Tracking
    Yang, Weiming
    Liu, Yuliang
    Zhang, Quan
    Zheng, Yelong
    IEEE ACCESS, 2019, 7 : 50466 - 50475
  • [25] Transfer-learning-based representation learning for trajectory similarity search
    Lai, Danling
    Qu, Jianfeng
    Sang, Yu
    Chen, Xi
    GEOINFORMATICA, 2024, 28 (04) : 631 - 648
  • [26] A Dynamic Neighborhood Learning-Based Gravitational Search Algorithm
    Zhang, Aizhu
    Sun, Genyun
    Ren, Jinchang
    Li, Xiaodong
    Wang, Zhenjie
    Jia, Xiuping
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) : 436 - 447
  • [27] Scalable reinforcement learning-based neural architecture search
    Amber Cassimon
    Siegfried Mercelis
    Kevin Mets
    Neural Computing and Applications, 2025, 37 (1) : 231 - 261
  • [28] Effective Learning-Based Hybrid Search for Bandwidth Coloring
    Jin, Yan
    Hao, Jin-Kao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2015, 45 (04): : 624 - 635
  • [29] RLPS: A Reinforcement Learning-Based Framework for Personalized Search
    Yao, Jing
    Dou, Zhicheng
    Xu, Jun
    Wen, Ji-Rong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2021, 39 (03)
  • [30] Deep Learning-Based Inverse Scattering With Structural Similarity Loss Functions
    Huang, Youyou
    Song, Rencheng
    Xu, Kuiwen
    Ye, Xiuzhu
    Li, Chang
    Chen, Xun
    IEEE SENSORS JOURNAL, 2021, 21 (04) : 4900 - 4907