Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

被引：103

作者：

Viswanath, P. ^{[2
]}

Babu, V. Suresh ^{[1
]}

机构：

[1] Univ Bedfordshire, Inst Res Applicable Comp, Dept Comp & Informat Syst, Luton LU1 3JU, Beds, England

[2] NRI Inst Technol, Pattern Recognit Res Lab, Dept Comp Sci & Engn, Guntur 522009, Andhra Pradesh, India

来源：

PATTERN RECOGNITION LETTERS | 2009年 / 30卷 / 16期

关键词：

Clustering; Density based clustering; DBSCAN; Leaders; Rough sets;

D O I：

10.1016/j.patrec.2009.08.008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n(2)) where n is the size of the dataset, and because of this it is not a suitable one to work with large datasets. A solution proposed in the paper is to apply the leaders clustering method first to derive the prototypes called leaders from the dataset which along with prototypes preserves the density information also, then to use these leaders to derive the density based clusters. The proposed hybrid clustering technique called rough-DBSCAN has a time complexity of O(n) only and is analyzed using rough set theory. Experimental studies are done using both synthetic and real world datasets to compare rough-DBSCAN with DBSCAN. It is shown that for large datasets rough-DBSCAN can find a similar clustering as found by the DBSCAN, but is consistently faster than DBSCAN. Also some properties of the leaders as prototypes are formally established. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：1477 / 1488

页数：12

共 50 条

[31] A Fast Density and Grid Based Clustering Method for Data With Arbitrary Shapes and Noise
Wu, Bo
Wilamowski, Bogdan M.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (04) : 1620 - 1628
[32] Density-Accumulated Arbitrary Shaped Clustering for Large Data Sets
Chen, Huaqi
2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 1088 - 1092
[33] Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce
Zhang, Yanfeng
Chen, Shimin
Yu, Ge
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (12) : 3218 - 3230
[34] Clustering in large data sets with the limited memory bundle method
Karmitsa, Napsu
Bagirov, Adil M.
Taheri, Sona
PATTERN RECOGNITION, 2018, 83 : 245 - 259
[35] A knowledge mining method for continuous data based on fuzzy C-means clustering and rough sets
Xu, Xi
Yao, Qionghui
Shi, Min
WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5846 - 5849
[36] A projection method for robust estimation and clustering in large data sets
Pena, Daniel
Prieto, Francisco J.
DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 209 - +
[37] A hybrid algorithm for K-medoid clustering of large data sets
Sheng, WG
Liu, XH
CEC2004: PROCEEDINGS OF THE 2004 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2004, : 77 - 82
[38] A Hybrid and Parameter-Free Clustering Algorithm for Large Data Sets
Shao, Hengkang
Zhang, Ping
Chen, Xinye
Li, Fang
Du, Guanglong
IEEE ACCESS, 2019, 7 : 24806 - 24818
[39] KM-DBSCAN: Density-Based Clustering of Massive Spatial Data with Keywords
Jang, Hong-Jun
Kim, Byoungwook
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2021, 11
[40] An Adaptive Hierarchical Clustering Method for Ship Trajectory Data Based on DBSCAN Algorithm
Zhao, Liangbin
Shi, Guoyou
Yang, Jiaxuan
2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 334 - 341

← 1 2 3 4 5 →