DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces

被引:9
|
作者
Yang, Keyu [1 ]
Gao, Yunjun [1 ,3 ]
Ma, Rui [1 ]
Chen, Lu [2 ]
Wu, Sai [1 ]
Chen, Gang [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
[2] Aalborg Univ, Dept Comp Sci, Aalborg, Denmark
[3] Zhejiang Univ, Joint Inst Frontier Technol, Alibaba, Hangzhou, Zhejiang, Peoples R China
基金
国家重点研发计划;
关键词
MR-DBSCAN; ALGORITHM;
D O I
10.1109/ICDE.2019.00122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN is one of important density-based clustering methods, which has a wide range of applications in machine learning and data mining, to name but a few. However, the rapid growing volume and variety of data nowadays challenges traditional DBSCAN, and thus, distributed DBSCAN in metric spaces is required. In this paper, we propose DBSCAN-MS, a distributed density-based clustering in metric spaces. To ensure load balancing, we present a k-d tree based partitioning approach. It utilizes pivots to map the data in metric spaces to vector spaces, and employs k-d tree partitioning technique to equally divide the data. To avoid unnecessary computation and communication cost, we propose a framework that divides data into partitions, find out local DBSCAN result, and merge local result based on a merging graph. In addition, the pivot filtering and the sliding window techniques are also used in the framework for pruning. Extensive experiments with both real and synthetic datasets demonstrate the efficiency and scalability of our proposed DBSCAN-MS.
引用
收藏
页码:1346 / 1357
页数:12
相关论文
共 50 条
  • [31] DBSCAN: Optimal Rates For Density-Based Cluster Estimation
    Wang, Daren
    Lu, Xinyang
    Rinaldo, Alessandro
    JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [32] DBSCAN: Optimal rates for density-based cluster estimation
    Wang, Daren
    Lu, Xinyang
    Rinaldo, Alessandro
    Journal of Machine Learning Research, 2019, 20
  • [33] SS-DBSCAN: Semi-Supervised Density-Based Spatial Clustering of Applications With Noise for Meaningful Clustering in Diverse Density Data
    Zaki Abdulhameed, Tiba
    Yousif, Suhad A.
    Samawi, Venus W.
    Imad Al-Shaikhli, Hasnaa
    IEEE ACCESS, 2024, 12 : 131507 - 131520
  • [34] Density-Based Clustering with Constraints
    Lasek, Piotr
    Gryz, Jarek
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2019, 16 (02) : 469 - 489
  • [35] Density-Based Clustering of Polygons
    Joshi, Deepti
    Samal, Ashok K.
    Soh, Leen-Kiat
    2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, : 171 - 178
  • [36] Directional density-based clustering
    Saavedra-Nieves, Paula
    Fernandez-Perez, Martin
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024,
  • [37] Active Density-Based Clustering
    Mai, Son T.
    He, Xiao
    Hubig, Nina
    Plant, Claudia
    Boehm, Christian
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 508 - 517
  • [38] Stability of Density-Based Clustering
    Rinaldo, Alessandro
    Singh, Aarti
    Nugent, Rebecca
    Wasserman, Larry
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 905 - 948
  • [39] Recursive density-based hierarchical clustering in gaussian distributed sensor network
    Meeta Gupta
    Adwitiya Sinha
    International Journal of System Assurance Engineering and Management, 2023, 14 : 19 - 28
  • [40] An efficient density-based clustering algorithm for vertically partitioned distributed datasets
    Department of Computer Science and Engineering, Southeastern University, Nanjing 210096, China
    不详
    Jisuanji Yanjiu yu Fazhan, 2007, 9 (1612-1617):