Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

被引:103
|
作者
Viswanath, P. [2 ]
Babu, V. Suresh [1 ]
机构
[1] Univ Bedfordshire, Inst Res Applicable Comp, Dept Comp & Informat Syst, Luton LU1 3JU, Beds, England
[2] NRI Inst Technol, Pattern Recognit Res Lab, Dept Comp Sci & Engn, Guntur 522009, Andhra Pradesh, India
关键词
Clustering; Density based clustering; DBSCAN; Leaders; Rough sets;
D O I
10.1016/j.patrec.2009.08.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n(2)) where n is the size of the dataset, and because of this it is not a suitable one to work with large datasets. A solution proposed in the paper is to apply the leaders clustering method first to derive the prototypes called leaders from the dataset which along with prototypes preserves the density information also, then to use these leaders to derive the density based clusters. The proposed hybrid clustering technique called rough-DBSCAN has a time complexity of O(n) only and is analyzed using rough set theory. Experimental studies are done using both synthetic and real world datasets to compare rough-DBSCAN with DBSCAN. It is shown that for large datasets rough-DBSCAN can find a similar clustering as found by the DBSCAN, but is consistently faster than DBSCAN. Also some properties of the leaders as prototypes are formally established. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1477 / 1488
页数:12
相关论文
共 50 条
  • [31] A Fast Density and Grid Based Clustering Method for Data With Arbitrary Shapes and Noise
    Wu, Bo
    Wilamowski, Bogdan M.
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (04) : 1620 - 1628
  • [32] Density-Accumulated Arbitrary Shaped Clustering for Large Data Sets
    Chen, Huaqi
    2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 1088 - 1092
  • [33] Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce
    Zhang, Yanfeng
    Chen, Shimin
    Yu, Ge
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3218 - 3230
  • [34] Clustering in large data sets with the limited memory bundle method
    Karmitsa, Napsu
    Bagirov, Adil M.
    Taheri, Sona
    PATTERN RECOGNITION, 2018, 83 : 245 - 259
  • [35] A knowledge mining method for continuous data based on fuzzy C-means clustering and rough sets
    Xu, Xi
    Yao, Qionghui
    Shi, Min
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5846 - 5849
  • [36] A projection method for robust estimation and clustering in large data sets
    Pena, Daniel
    Prieto, Francisco J.
    DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 209 - +
  • [37] A hybrid algorithm for K-medoid clustering of large data sets
    Sheng, WG
    Liu, XH
    CEC2004: PROCEEDINGS OF THE 2004 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2004, : 77 - 82
  • [38] A Hybrid and Parameter-Free Clustering Algorithm for Large Data Sets
    Shao, Hengkang
    Zhang, Ping
    Chen, Xinye
    Li, Fang
    Du, Guanglong
    IEEE ACCESS, 2019, 7 : 24806 - 24818
  • [39] KM-DBSCAN: Density-Based Clustering of Massive Spatial Data with Keywords
    Jang, Hong-Jun
    Kim, Byoungwook
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2021, 11
  • [40] An Adaptive Hierarchical Clustering Method for Ship Trajectory Data Based on DBSCAN Algorithm
    Zhao, Liangbin
    Shi, Guoyou
    Yang, Jiaxuan
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 334 - 341