Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

被引:103
|
作者
Viswanath, P. [2 ]
Babu, V. Suresh [1 ]
机构
[1] Univ Bedfordshire, Inst Res Applicable Comp, Dept Comp & Informat Syst, Luton LU1 3JU, Beds, England
[2] NRI Inst Technol, Pattern Recognit Res Lab, Dept Comp Sci & Engn, Guntur 522009, Andhra Pradesh, India
关键词
Clustering; Density based clustering; DBSCAN; Leaders; Rough sets;
D O I
10.1016/j.patrec.2009.08.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n(2)) where n is the size of the dataset, and because of this it is not a suitable one to work with large datasets. A solution proposed in the paper is to apply the leaders clustering method first to derive the prototypes called leaders from the dataset which along with prototypes preserves the density information also, then to use these leaders to derive the density based clusters. The proposed hybrid clustering technique called rough-DBSCAN has a time complexity of O(n) only and is analyzed using rough set theory. Experimental studies are done using both synthetic and real world datasets to compare rough-DBSCAN with DBSCAN. It is shown that for large datasets rough-DBSCAN can find a similar clustering as found by the DBSCAN, but is consistently faster than DBSCAN. Also some properties of the leaders as prototypes are formally established. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1477 / 1488
页数:12
相关论文
共 50 条
  • [1] l-DBSCAN :: A fast hybrid density based clustering method
    Viswanath, P.
    Pinkesh, Rajwala
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 912 - +
  • [2] A Fast Method of Coarse Density Clustering for Large Data Sets
    Zhao, Lei
    Yang, Jiwen
    Fan, Jianxi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 1941 - 1945
  • [3] A Hybrid Clustering Algorithm Based on Grid Density and Rough Sets
    Lv Huigang
    Teng Peng
    Huang Jun
    Zhang Fengming
    PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 4, 2008, : 607 - 611
  • [4] dbscan: Fast Density-Based Clustering with R
    Hahsler, Michael
    Piekenbrock, Matthew
    Doran, Derek
    JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (01): : 1 - 30
  • [5] BLOCK-DBSCAN: Fast clustering for large scale data
    Chen, Yewang
    Zhou, Lida
    Bouguila, Nizar
    Wang, Cheng
    Chen, Yi
    Du, Jixiang
    PATTERN RECOGNITION, 2021, 109
  • [6] An Improved DBSCAN, A Density Based Clustering Algorithm with Parameter Selection for High Dimensional Data Sets
    Shah, Glory H.
    3RD NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING (NUICONE 2012), 2012,
  • [7] MDST-DBSCAN: A Density-Based Clustering Method for Multidimensional Spatiotemporal Data
    Choi, Changlock
    Hong, Seong-Yun
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (06)
  • [8] Parameter reduction for density-based clustering on large data sets
    Wang, BY
    Perrizo, W
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2004, : 181 - 186
  • [9] DESCRY: A density based clustering algorithm for very large data sets
    Angiulli, F
    Pizzuti, C
    Ruffolo, M
    INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 203 - 210
  • [10] KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data
    Chen, Yewang
    Zhou, Lida
    Pei, Songwen
    Yu, Zhiwen
    Chen, Yi
    Liu, Xin
    Du, Jixiang
    Xiong, Naixue
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (06): : 3939 - 3953