Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

被引:103
|
作者
Viswanath, P. [2 ]
Babu, V. Suresh [1 ]
机构
[1] Univ Bedfordshire, Inst Res Applicable Comp, Dept Comp & Informat Syst, Luton LU1 3JU, Beds, England
[2] NRI Inst Technol, Pattern Recognit Res Lab, Dept Comp Sci & Engn, Guntur 522009, Andhra Pradesh, India
关键词
Clustering; Density based clustering; DBSCAN; Leaders; Rough sets;
D O I
10.1016/j.patrec.2009.08.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n(2)) where n is the size of the dataset, and because of this it is not a suitable one to work with large datasets. A solution proposed in the paper is to apply the leaders clustering method first to derive the prototypes called leaders from the dataset which along with prototypes preserves the density information also, then to use these leaders to derive the density based clusters. The proposed hybrid clustering technique called rough-DBSCAN has a time complexity of O(n) only and is analyzed using rough set theory. Experimental studies are done using both synthetic and real world datasets to compare rough-DBSCAN with DBSCAN. It is shown that for large datasets rough-DBSCAN can find a similar clustering as found by the DBSCAN, but is consistently faster than DBSCAN. Also some properties of the leaders as prototypes are formally established. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1477 / 1488
页数:12
相关论文
共 50 条
  • [21] Dijkstra's-DBSCAN: Fast, Accurate, and Routable Density Based Clustering of Traffic Incidents on Large Road Network
    Zhang, Yang
    Hang, Lee D.
    Kim, Hyun
    TRANSPORTATION RESEARCH RECORD, 2018, 2672 (45) : 265 - 273
  • [22] Data summarization based fast hierarchical clustering method for large datasets
    Patra, Bidyut Kr.
    Nandi, Sukumar
    Viswanath, P.
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 278 - +
  • [23] Dboost: A Fast Algorithm for DBSCAN-based Clustering on High Dimensional Data
    Zhang, Yuxiao
    Wang, Xiaorong
    Li, Bingyang
    Chen, Wei
    Wang, Tengjiao
    Lei, Kai
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II, 2016, 9652 : 245 - 256
  • [24] DBSCAN plus plus : Towards fast and scalable density clustering
    Jang, Jennifer
    Jiang, Heinrich
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [25] A fast DBSCAN algorithm for big data based on efficient density calculation
    Hanafi, Nooshin
    Saadatfar, Hamid
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
  • [26] An adaptive rough fuzzy single pass algorithm for clustering large data sets
    Asharaf, S
    Murty, MN
    PATTERN RECOGNITION, 2003, 36 (12) : 3015 - 3018
  • [27] NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data
    Lulli, Alessandro
    Dell'Amico, Matteo
    Michiardi, Pietro
    Ricci, Laura
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 10 (03): : 157 - 168
  • [28] Clustering method of unbalanced large data density based on dynamic grid
    Wang, Yang
    WEB INTELLIGENCE, 2022, 20 (04) : 287 - 295
  • [29] A Data Labeling method for Categorical Data Clustering using Cluster Entropies in Rough Sets
    Reddy, H. Venkateswara
    Kumar, B. Suresh
    Raju, S. Viswanadha
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 444 - 449
  • [30] A visual and interactive data exploration method for large data sets and clustering
    Da Costa, David
    Venturini, Gilles
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2007, 4632 : 553 - +