DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces

被引:9
|
作者
Yang, Keyu [1 ]
Gao, Yunjun [1 ,3 ]
Ma, Rui [1 ]
Chen, Lu [2 ]
Wu, Sai [1 ]
Chen, Gang [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
[2] Aalborg Univ, Dept Comp Sci, Aalborg, Denmark
[3] Zhejiang Univ, Joint Inst Frontier Technol, Alibaba, Hangzhou, Zhejiang, Peoples R China
基金
国家重点研发计划;
关键词
MR-DBSCAN; ALGORITHM;
D O I
10.1109/ICDE.2019.00122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN is one of important density-based clustering methods, which has a wide range of applications in machine learning and data mining, to name but a few. However, the rapid growing volume and variety of data nowadays challenges traditional DBSCAN, and thus, distributed DBSCAN in metric spaces is required. In this paper, we propose DBSCAN-MS, a distributed density-based clustering in metric spaces. To ensure load balancing, we present a k-d tree based partitioning approach. It utilizes pivots to map the data in metric spaces to vector spaces, and employs k-d tree partitioning technique to equally divide the data. To avoid unnecessary computation and communication cost, we propose a framework that divides data into partitions, find out local DBSCAN result, and merge local result based on a merging graph. In addition, the pivot filtering and the sliding window techniques are also used in the framework for pruning. Extensive experiments with both real and synthetic datasets demonstrate the efficiency and scalability of our proposed DBSCAN-MS.
引用
收藏
页码:1346 / 1357
页数:12
相关论文
共 50 条
  • [1] dbscan: Fast Density-Based Clustering with R
    Hahsler, Michael
    Piekenbrock, Matthew
    Doran, Derek
    JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (01): : 1 - 30
  • [2] Density-based Data Clustering Algorithm in Multi-metric Spaces
    Zhu, Yi-Fan
    Luo, Cheng-Yang
    Ma, Rui-Yao
    Chen, Lu
    Mao, Yu-Ren
    Gao, Yun-Jun
    Ruan Jian Xue Bao/Journal of Software, 2025, 36 (02): : 851 - 873
  • [3] C-DBSCAN: Density-based clustering with constraints
    Ruiz, Carlos
    Spiliopoulou, Myra
    Menasalvas, Ernestina
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2007, 4482 : 216 - +
  • [4] Significant DBSCAN plus : Statistically Robust Density-based Clustering
    Xie, Yiqun
    Jia, Xiaowei
    Shekhar, Shashi
    Bao, Han
    Zhou, Xun
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (05)
  • [5] FEM-DBSCAN: An Efficient Density-Based Clustering Approach
    Uranus Kazemi
    Reza Boostani
    Iranian Journal of Science and Technology, Transactions of Electrical Engineering, 2021, 45 : 979 - 992
  • [6] Scalable density-based distributed clustering
    Januzaj, E
    Kriegel, HP
    Pfeifle, M
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 231 - 244
  • [7] FEM-DBSCAN: An Efficient Density-Based Clustering Approach
    Kazemi, Uranus
    Boostani, Reza
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2021, 45 (03) : 979 - 992
  • [8] NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data
    Lulli, Alessandro
    Dell'Amico, Matteo
    Michiardi, Pietro
    Ricci, Laura
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 10 (03): : 157 - 168
  • [9] NS-DBSCAN: A Density-Based Clustering Algorithm in Network Space
    Wang, Tianfu
    Ren, Chang
    Luo, Yun
    Tian, Jing
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (05)
  • [10] DBSCAN-MO: Density-Based Clustering among Moving Obstacles
    Stefanakis, Emmanuel
    EUROPEAN INFORMATION SOCIETY: TAKING GEOINFORMATION SCIENCE ONE STEP FURTHER, 2009, : 159 - 179