DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces

被引:9
|
作者
Yang, Keyu [1 ]
Gao, Yunjun [1 ,3 ]
Ma, Rui [1 ]
Chen, Lu [2 ]
Wu, Sai [1 ]
Chen, Gang [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
[2] Aalborg Univ, Dept Comp Sci, Aalborg, Denmark
[3] Zhejiang Univ, Joint Inst Frontier Technol, Alibaba, Hangzhou, Zhejiang, Peoples R China
基金
国家重点研发计划;
关键词
MR-DBSCAN; ALGORITHM;
D O I
10.1109/ICDE.2019.00122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN is one of important density-based clustering methods, which has a wide range of applications in machine learning and data mining, to name but a few. However, the rapid growing volume and variety of data nowadays challenges traditional DBSCAN, and thus, distributed DBSCAN in metric spaces is required. In this paper, we propose DBSCAN-MS, a distributed density-based clustering in metric spaces. To ensure load balancing, we present a k-d tree based partitioning approach. It utilizes pivots to map the data in metric spaces to vector spaces, and employs k-d tree partitioning technique to equally divide the data. To avoid unnecessary computation and communication cost, we propose a framework that divides data into partitions, find out local DBSCAN result, and merge local result based on a merging graph. In addition, the pivot filtering and the sliding window techniques are also used in the framework for pruning. Extensive experiments with both real and synthetic datasets demonstrate the efficiency and scalability of our proposed DBSCAN-MS.
引用
收藏
页码:1346 / 1357
页数:12
相关论文
共 50 条
  • [21] Constrained Density-Based Spatial Clustering of Applications with Noise (DBSCAN) using hyperparameter optimization
    Kim, Jongwon
    Lee, Hyeseon
    Ko, Young Myoung
    KNOWLEDGE-BASED SYSTEMS, 2024, 303
  • [22] MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce
    He, Yaobin
    Tan, Haoyu
    Luo, Wuman
    Mao, Huajian
    Ma, Di
    Feng, Shengzhong
    Fan, Jianping
    2011 IEEE 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2011, : 473 - 480
  • [23] K-DBSCAN: An efficient density-based clustering algorithm supports parallel computing
    Deng C.
    Song J.
    Cai S.
    Sun R.
    Shi Y.
    Hao S.
    International Journal of Simulation and Process Modelling, 2018, 13 (05) : 496 - 505
  • [24] Density-based clustering
    Campello, Ricardo J. G. B.
    Kroeger, Peer
    Sander, Jorg
    Zimek, Arthur
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (02)
  • [25] Density-based clustering
    Kriegel, Hans-Peter
    Kroeger, Peer
    Sander, Joerg
    Zimek, Arthur
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (03) : 231 - 240
  • [26] RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates
    Bryant, Avory
    Cios, Krzysztof
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (06) : 1109 - 1121
  • [27] Calculation of density-based clustering parameters supported with distributed processing
    Gorawski, Marcin
    Malczok, Rafal
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 417 - 426
  • [28] KR-DBSCAN: A density-based clustering algorithm based on reverse nearest neighbor and influence space
    Hu, Lihua
    Liu, Hongkai
    Zhang, Jifu
    Liu, Aiqin
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 186
  • [29] TSF-DBSCAN: A Novel Fuzzy Density-Based Approach for Clustering Unbounded Data Streams
    Bechini, Alessio
    Marcelloni, Francesco
    Renda, Alessandro
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (03) : 623 - 637
  • [30] FSS-DBSCAN: Outsourced Private Density-Based Clustering via Function Secret Sharing
    Fu, Jiaxuan
    Cheng, Ke
    Song, Anxiao
    Xia, Yuheng
    Chang, Zhao
    Shen, Yulong
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 7759 - 7773