An Empirical Evaluation of Set Similarity Join Techniques

被引:0
|
作者
Mann, Willi [1 ]
Augsten, Nikolaus [1 ]
Bouros, Panagiotis [2 ]
机构
[1] Salzburg Univ, Salzburg, Austria
[2] Aarhus Univ, Aarhus, Denmark
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 9卷 / 09期
基金
奥地利科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Set similarity joins compute all pairs of similar sets from two collections of sets. We conduct extensive experiments on seven state-of-the-art algorithms for set similarity joins. These algorithms adopt a filter-verification approach. Our analysis shows that verification has not received enough attention in previous works. In practice, efficient verification inspects only a small, constant number of set elements and is faster than some of the more sophisticated filter techniques. Although we can identify three winners, we find that most algorithms show very similar performance. The key technique is the prefix filter, and AllPairs, the first algorithm adopting this techniques is still a relevant competitor. We repeat experiments from previous work and discuss diverging results. All our claims are supported by a detailed analysis of the factors that determine the overall runtime.
引用
收藏
页码:636 / 647
页数:12
相关论文
共 50 条
  • [1] An empirical evaluation of exact set similarity join techniques using GPUs
    Bellas, Christos
    Gounaris, Anastasios
    INFORMATION SYSTEMS, 2020, 89
  • [2] Distributed Streaming Set Similarity Join
    Yang, Jianye
    Zhang, Wenjie
    Wang, Xiang
    Zhang, Ying
    Lin, Xuemin
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 565 - 576
  • [3] Set Similarity Join on Probabilistic Data
    Lian, Xiang
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 650 - 659
  • [4] Scalable and Robust Set Similarity Join
    Christiani, Tobias
    Pagh, Rasmus
    Sivertsen, Johan
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1240 - 1243
  • [5] Leveraging Set Relations in Exact Set Similarity Join
    Wang, Xubo
    Qin, Lu
    Lin, Xuemin
    Zhang, Ying
    Chang, Lijun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (09): : 925 - 936
  • [6] Leveraging set relations in exact and dynamic set similarity join
    Xubo Wang
    Lu Qin
    Xuemin Lin
    Ying Zhang
    Lijun Chang
    The VLDB Journal, 2019, 28 : 267 - 292
  • [7] Leveraging set relations in exact and dynamic set similarity join
    Wang, Xubo
    Qin, Lu
    Lin, Xuemin
    Zhang, Ying
    Chang, Lijun
    VLDB JOURNAL, 2019, 28 (02): : 267 - 292
  • [8] SSTR: Set Similarity Join over Stream Data
    Pacifico, Lucas
    Ribeiro, Leonardo Andrade
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS), VOL 1, 2020, : 52 - 60
  • [9] Incorporating Clustering into Set Similarity Join Algorithms: The SjClust Framework
    Ribeiro, Leonardo Andrade
    Cuzzocrea, Alfredo
    Alves Bezerra, Karen Aline
    Bahia do Nascimento, Ben Hur
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2016, PT I, 2016, 9827 : 185 - 204
  • [10] Dynamic Set Similarity Join: An Update Log Based Approach
    Yang, Chengcheng
    Chen, Lisi
    Wang, Hao
    Shang, Shuo
    Mao, Rui
    Zhang, Xiangliang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3727 - 3741