DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces

被引:9
|
作者
Yang, Keyu [1 ]
Gao, Yunjun [1 ,3 ]
Ma, Rui [1 ]
Chen, Lu [2 ]
Wu, Sai [1 ]
Chen, Gang [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
[2] Aalborg Univ, Dept Comp Sci, Aalborg, Denmark
[3] Zhejiang Univ, Joint Inst Frontier Technol, Alibaba, Hangzhou, Zhejiang, Peoples R China
基金
国家重点研发计划;
关键词
MR-DBSCAN; ALGORITHM;
D O I
10.1109/ICDE.2019.00122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DBSCAN is one of important density-based clustering methods, which has a wide range of applications in machine learning and data mining, to name but a few. However, the rapid growing volume and variety of data nowadays challenges traditional DBSCAN, and thus, distributed DBSCAN in metric spaces is required. In this paper, we propose DBSCAN-MS, a distributed density-based clustering in metric spaces. To ensure load balancing, we present a k-d tree based partitioning approach. It utilizes pivots to map the data in metric spaces to vector spaces, and employs k-d tree partitioning technique to equally divide the data. To avoid unnecessary computation and communication cost, we propose a framework that divides data into partitions, find out local DBSCAN result, and merge local result based on a merging graph. In addition, the pivot filtering and the sliding window techniques are also used in the framework for pruning. Extensive experiments with both real and synthetic datasets demonstrate the efficiency and scalability of our proposed DBSCAN-MS.
引用
收藏
页码:1346 / 1357
页数:12
相关论文
共 50 条
  • [41] DENCAST: distributed density-based clustering for multi-target regression
    Roberto Corizzo
    Gianvito Pio
    Michelangelo Ceci
    Donato Malerba
    Journal of Big Data, 6
  • [42] Recursive density-based hierarchical clustering in gaussian distributed sensor network
    Gupta, Meeta
    Sinha, Adwitiya
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023, 14 (01) : 19 - 28
  • [43] DENCAST: distributed density-based clustering for multi-target regression
    Corizzo, Roberto
    Pio, Gianvito
    Ceci, Michelangelo
    Malerba, Donato
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [44] Use Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Algorithm to Identify Galaxy Cluster Members
    Zhang, Mingrui
    2018 4TH INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION, 2019, 252
  • [45] Generalizing Local Density for Density-Based Clustering
    Lin, Jun-Lin
    SYMMETRY-BASEL, 2021, 13 (02): : 1 - 24
  • [46] Density-Based Clustering for Adaptive Density Variation
    Qian, Li
    Plant, Claudia
    Boehm, Christian
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1282 - 1287
  • [47] Novel density-based and hierarchical density-based clustering algorithms for uncertain data
    Zhang, Xianchao
    Liu, Han
    Zhang, Xiaotong
    NEURAL NETWORKS, 2017, 93 : 240 - 255
  • [48] Cludoop: An Efficient Distributed Density-Based Clustering for Big Data Using Hadoop
    Yu, Yanwei
    Zhao, Jindong
    Wang, Xiaodong
    Wang, Qin
    Zhang, Yonggang
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2015,
  • [49] An Adaptive Density-Based Fuzzy Clustering Track Association for Distributed Tracking System
    Nazari, Mousa
    Pashazadeh, Saeid
    Mohammad-Khanli, Leyli
    IEEE ACCESS, 2019, 7 : 135972 - 135981
  • [50] An improved method for density-based clustering
    Jin, Hong
    Wang, Shuliang
    Zhou, Qian
    Li, Ying
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2014, 6 (04) : 347 - 368