Performance Evaluation of DBSCAN With Similarity Join Algorithms

被引:0
|
作者
Radulescu, Iulia Maria [1 ]
Truica, Ciprian-Octavian [1 ]
Apostol, Elena-Simona [1 ]
Boicea, Alexandru [1 ]
Radulescu, Florin [1 ]
Mocanu, Mariana [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Comp Sci & Engn Dept, Bucharest, Romania
关键词
DBSCAN; similarity join; k-d-tree indexing; QuickJoin; INDEX; SEARCH; SPACES;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
Clustering is an important Data Mining operation that groups objects into clusters based on their similarity. The similarity join is a primitive operation used in clustering which retrieves the most similar pairs from two input data-sets based on a dissimilarity function (also named metric). In this article, we transform DBSCAN's (Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with noise) algorithmic schema by replacing the multiple range queries with a single similarity join to minimize the hyperparameter. Thus, instead of the two hyperparameters required by the DBSCAN, our approach requires only the neighborhood radius epsilon hyperparameter. We propose two implementations for DBSCAN with similarity join: i) QuickDBSCAN that uses an adapted QuickJoin algorithm and ii) KDTreeDBSCAN that uses k-d-tree indexing structure. The experimental results show that DBSCAN with similarity join outperforms the classic DBSCAN.
引用
收藏
页码:7957 / 7966
页数:10
相关论文
共 50 条
  • [21] Performance analysis of three text-join algorithms
    State Univ of New York at Binghamton, Binghamton, United States
    IEEE Trans Knowl Data Eng, 3 (477-492):
  • [22] Evaluation the Algorithms of Fuzzy Temporal Databases Join Operations
    Deng, Liguo
    Ma, Z. M.
    Zhang, Gang
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 4964 - +
  • [23] Performance analysis of three text-join algorithms
    Meng, WY
    Yu, C
    Wang, W
    Rishe, N
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (03) : 477 - 492
  • [24] Similarity Join and Similarity Self-Join Size Estimation in a Streaming Environment
    Rafiei, Davood
    Deng, Fan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (04) : 768 - 781
  • [25] An empirical evaluation of exact set similarity join techniques using GPUs
    Bellas, Christos
    Gounaris, Anastasios
    INFORMATION SYSTEMS, 2020, 89
  • [26] An evaluation of new and old similarity ranking algorithms
    Lynch, P
    Luan, XC
    Prettyman, M
    Mericle, L
    Borkmann, E
    Schlaifer, J
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS, 2004, : 148 - 149
  • [27] The Similarity Join Database Operator
    Silva, Yasin N.
    Aref, Walid G.
    Ali, Mohamed H.
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 892 - 903
  • [28] Similarity join in metric spaces
    Dohnal, V
    Gennaro, C
    Savino, P
    Zezula, P
    ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 452 - 467
  • [29] Performance Evaluation and Model Checking Join Forces
    Baier, Christel
    Haverkort, Boudewijn R.
    Hermanns, Holger
    Katoen, Joost-Pieter
    COMMUNICATIONS OF THE ACM, 2010, 53 (09) : 76 - 85
  • [30] A performance evaluation of spatial join processing strategies
    Papadopoulos, A
    Rigaux, P
    Scholl, M
    ADVANCES IN SPATIAL DATABASES, 1999, 1651 : 286 - 307