GB-DBSCAN: A fast granular-ball based DBSCAN clustering algorithm

被引:4
|
作者
Cheng, Dongdong [1 ,2 ,3 ,4 ]
Zhang, Cheng [2 ,3 ,4 ]
Li, Ya [2 ,3 ,4 ]
Xia, Shuyin [2 ,3 ,4 ]
Wang, Guoyin [2 ,3 ,4 ]
Huang, Jinlong [1 ]
Zhang, Sulan [1 ]
Xie, Jiang [2 ,3 ,4 ]
机构
[1] Yangtze Normal Univ, Coll Big Data & Intelligent Engn, Chongqing 408100, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Comp Intelligence, Chongqing 400065, Peoples R China
[3] Chongqing Univ Posts & Telecommun, Key Lab Cyberspace Big Data Intelligent Secur, Minist Educ, Chongqing 400065, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Key Lab Big Data Intelligent Comp, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
DBSCAN; Granular-ball; KNN; Clustering;
D O I
10.1016/j.ins.2024.120731
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) identifies high-density connected areas as clusters, so that it has advantages in discovering arbitrary-shaped clusters. However, it has difficulty in adjusting parameters and since it needs to scan all data points in turn, its time complexity is O(n2). Granular-ball (GB) is a coarse grained representation of data. It is on the basis of the assumption that an object and its local neighbors have similar distribution and they have high possibility of belonging to the same class. It has been introduced into supervised learning by Xia et al. to improve the efficiency of supervised learning. Inspired by the idea of granular-ball, we introduce it into unsupervised learning and use it to improve the efficiency of DBSCAN, called GB-DBSCAN. The main idea of the proposed algorithm GB-DBSCAN is to employ granular-ball to represent a set of data points and then clustering on granular-balls, instead of the data points. Firstly, we use k-nearest neighbors (KNN) to generate granular-balls, which is a bottom-up strategy, and describe granular-balls according to their centers and radius. Then, the granular-balls are divided into Core-GBs and Non-Core-GBs according to their density. After that, the Core-GBs are merged into clusters according to the idea of DBSCAN and the Non-Core-GBs are assigned to the appropriate clusters. Since the granular-balls' number is much smaller than the size of the objects in a dataset, the running time of DBSCAN is greatly reduced. By comparing with KNN-BLOCK DBSCAN, RNN-DBSCAN, DBSCAN, K-means, DP and SNN-DPC algorithms, the proposed algorithm can get similar or even better clustering result in much less running time.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Privacy-Preserving DBSCAN Clustering Algorithm Based on Negative Database
    Zhang, Mingkun
    Liao, Hucheng
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 209 - 213
  • [42] Dependent Function Interval Parameters Training Algorithm based on DBSCAN Clustering
    Li Yang
    Xie Guangqiang
    Li Xiaomei
    Liu Hua
    PROCEEDINGS OF THE 31ST CHINESE CONTROL CONFERENCE, 2012, : 7709 - 7712
  • [43] A three-way clustering method based on an improved DBSCAN algorithm
    Yu, Hui
    Chen, LuYuan
    Yao, JingTao
    Wang, XingNan
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 535
  • [44] Identification of Convective and Stratiform Clouds Based on the Improved DBSCAN Clustering Algorithm
    Yuanyuan Zuo
    Zhiqun Hu
    Shujie Yuan
    Jiafeng Zheng
    Xiaoyan Yin
    Boyong Li
    Advances in Atmospheric Sciences, 2022, 39 : 2203 - 2212
  • [45] Identification of Convective and Stratiform Clouds Based on the Improved DBSCAN Clustering Algorithm
    Zuo, Yuanyuan
    Hu, Zhiqun
    Yuan, Shujie
    Zheng, Jiafeng
    Yin, Xiaoyan
    Li, Boyong
    ADVANCES IN ATMOSPHERIC SCIENCES, 2022, 39 (12) : 2203 - 2212
  • [46] A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data
    Chen, Yewang
    Tang, Shengyu
    Bouguila, Nizar
    Wang, Cheng
    Du, Jixiang
    Li, HaiLin
    PATTERN RECOGNITION, 2018, 83 : 375 - 387
  • [47] MRG-DBSCAN: An improved DBSCAN clustering method based on map reduce and grid
    Ma, Li
    Gu, Lei
    Li, Bo
    Qiao, Shouyi
    Wang, Jin
    International Journal of Database Theory and Application, 2015, 8 (02): : 119 - 128
  • [48] Analysis of Navigation Characteristics of Inland Watercraft Based on DBSCAN Clustering Algorithm
    Wang, Ming Dong
    Cheng, Si Yuan
    Zheng, Wei Wei
    Wu, Zhi Yuan
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 220 - 225
  • [49] New method to improve DBSCAN clustering algorithm quality
    Feng, Shao-Rong
    Xiao, Wen-Jun
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2008, 35 (03): : 523 - 529
  • [50] RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning
    Song, Hwanjun
    Lee, Jae-Gil
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1173 - 1187