Performance Evaluation of DBSCAN With Similarity Join Algorithms

被引:0
|
作者
Radulescu, Iulia Maria [1 ]
Truica, Ciprian-Octavian [1 ]
Apostol, Elena-Simona [1 ]
Boicea, Alexandru [1 ]
Radulescu, Florin [1 ]
Mocanu, Mariana [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Comp Sci & Engn Dept, Bucharest, Romania
关键词
DBSCAN; similarity join; k-d-tree indexing; QuickJoin; INDEX; SEARCH; SPACES;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
Clustering is an important Data Mining operation that groups objects into clusters based on their similarity. The similarity join is a primitive operation used in clustering which retrieves the most similar pairs from two input data-sets based on a dissimilarity function (also named metric). In this article, we transform DBSCAN's (Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with noise) algorithmic schema by replacing the multiple range queries with a single similarity join to minimize the hyperparameter. Thus, instead of the two hyperparameters required by the DBSCAN, our approach requires only the neighborhood radius epsilon hyperparameter. We propose two implementations for DBSCAN with similarity join: i) QuickDBSCAN that uses an adapted QuickJoin algorithm and ii) KDTreeDBSCAN that uses k-d-tree indexing structure. The experimental results show that DBSCAN with similarity join outperforms the classic DBSCAN.
引用
收藏
页码:7957 / 7966
页数:10
相关论文
共 50 条
  • [1] Parallelizing String Similarity Join Algorithms
    Yao, Ling-Chih
    Lim, Lipyeow
    DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : 322 - 327
  • [2] A near-optimal similarity join algorithm and performance evaluation
    Yang, ZW
    Yang, GQ
    INFORMATION SCIENCES, 2004, 167 (1-4) : 87 - 108
  • [3] High dimensional similarity joins: Algorithms and performance evaluation
    Koudas, N
    Sevcik, KC
    14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 466 - 475
  • [4] High dimensional similarity joins: Algorithms and performance evaluation
    Koudas, N
    Sevcik, KC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (01) : 3 - 18
  • [5] Incorporating Clustering into Set Similarity Join Algorithms: The SjClust Framework
    Ribeiro, Leonardo Andrade
    Cuzzocrea, Alfredo
    Alves Bezerra, Karen Aline
    Bahia do Nascimento, Ben Hur
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2016, PT I, 2016, 9827 : 185 - 204
  • [6] An Empirical Evaluation of Set Similarity Join Techniques
    Mann, Willi
    Augsten, Nikolaus
    Bouros, Panagiotis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (09): : 636 - 647
  • [7] Improving Similarity Join Algorithms using Vertical Clustering Techniques
    Tan, Lisa
    Fotouhi, Farshad
    Grosky, William
    2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 474 - +
  • [8] Improving Similarity Join Algorithms Using Fuzzy Clustering Technique
    Tan, Lisa
    Fotouhi, Farshad
    Grosky, William
    Pop, Horia F.
    Mouaddib, Noureddine
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 545 - +
  • [9] SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering
    Ribeiro, Leonardo Andrade
    Cuzzocrea, Alfredo
    Alves Bezerra, Karen Aline
    do Nascimento, Ben Hur Bahia
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, : 75 - 80
  • [10] Algorithms and performance evaluation of join processing on KD-tree indexed relations
    Kitsuregawa, Masaru, 1600, Publ by Scripta Technica Inc, New York, NY, United States (25):