Performance Evaluation of DBSCAN With Similarity Join Algorithms

被引:0
|
作者
Radulescu, Iulia Maria [1 ]
Truica, Ciprian-Octavian [1 ]
Apostol, Elena-Simona [1 ]
Boicea, Alexandru [1 ]
Radulescu, Florin [1 ]
Mocanu, Mariana [1 ]
机构
[1] Univ Politehn Bucuresti, Fac Automat Control & Comp, Comp Sci & Engn Dept, Bucharest, Romania
关键词
DBSCAN; similarity join; k-d-tree indexing; QuickJoin; INDEX; SEARCH; SPACES;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
Clustering is an important Data Mining operation that groups objects into clusters based on their similarity. The similarity join is a primitive operation used in clustering which retrieves the most similar pairs from two input data-sets based on a dissimilarity function (also named metric). In this article, we transform DBSCAN's (Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with noise) algorithmic schema by replacing the multiple range queries with a single similarity join to minimize the hyperparameter. Thus, instead of the two hyperparameters required by the DBSCAN, our approach requires only the neighborhood radius epsilon hyperparameter. We propose two implementations for DBSCAN with similarity join: i) QuickDBSCAN that uses an adapted QuickJoin algorithm and ii) KDTreeDBSCAN that uses k-d-tree indexing structure. The experimental results show that DBSCAN with similarity join outperforms the classic DBSCAN.
引用
收藏
页码:7957 / 7966
页数:10
相关论文
共 50 条
  • [41] Structural join and staircase join algorithms of sibling relationship
    Wan, Chang-Xuan
    Liu, Xi-Ping
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2007, 22 (02) : 171 - 181
  • [42] DESIGN AND EVALUATION OF ALGORITHMS FOR IMAGE RETRIEVAL BY SPATIAL SIMILARITY
    GUDIVADA, VN
    RAGHAVAN, VV
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (02) : 115 - 144
  • [43] Probabilistic similarity join on uncertain data
    Kriegel, HP
    Kunath, P
    Pfeifle, M
    Renz, M
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2006, 3882 : 295 - 309
  • [44] Multidimensional Similarity Join Using MapReduce
    Li, Ye
    Wang, Jian
    Hou, Leong U.
    WEB-AGE INFORMATION MANAGEMENT, PT II, 2016, 9659 : 457 - 468
  • [45] Continuous Similarity Join on Data Streams
    Cui, Jia
    Wang, Weiping
    Meng, Dan
    Liu, Zhenyan
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 552 - 559
  • [46] Efficient similarity join for certain graphs
    Ruan, Qunsheng
    Wu, Qingfeng
    Liu, Xiling
    Miao, Fengyu
    Wang, Yingdong
    MICROSYSTEM TECHNOLOGIES-MICRO-AND NANOSYSTEMS-INFORMATION STORAGE AND PROCESSING SYSTEMS, 2021, 27 (04): : 1665 - 1685
  • [47] Distributed Streaming Set Similarity Join
    Yang, Jianye
    Zhang, Wenjie
    Wang, Xiang
    Zhang, Ying
    Lin, Xuemin
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 565 - 576
  • [48] String similarity search and join: a survey
    Minghe Yu
    Guoliang Li
    Dong Deng
    Jianhua Feng
    Frontiers of Computer Science, 2016, 10 : 399 - 417
  • [49] On the Complexity of Inner Product Similarity Join
    Ahle, Thomas D.
    Pagh, Rasmus
    Razenshteyn, Ilya
    Silvestri, Francesco
    PODS'16: PROCEEDINGS OF THE 35TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2016, : 151 - 164
  • [50] Incremental processing for string similarity join
    Yan, Cairong
    Zhu, Bin
    Gan, Yanglan
    Xu, Guangwei
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2019, 20 (02) : 255 - 268