An efficient and scalable density-based Clustering algorithm for datasets with complex structures

被引:123
|
作者
Lv, Yinghua [1 ]
Ma, Tinghuai [2 ]
Tang, Meili [3 ]
Cao, Jie [4 ]
Tian, Yuan [5 ]
Al-Dhelaan, Abdullah [5 ]
Al-Rodhaan, Mznah [5 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing 210044, Jiangsu, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Jiangsu Engn Ctr Network Monitoring, Nanjing 210044, Jiangsu, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Publ Adm, Nanjing 210044, Jiangsu, Peoples R China
[4] Nanjing Univ Informat Sci & Technol, Sch Econ & Management, Nanjing 210044, Jiangsu, Peoples R China
[5] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11362, Saudi Arabia
基金
美国国家科学基金会; 中国博士后科学基金;
关键词
Density-based clustering; Locality sensitive hashing; The influence space; Border objects detecting; INDEXING METHOD;
D O I
10.1016/j.neucom.2015.05.109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a research branch of data mining, clustering, as an unsupervised learning scheme, focuses on assigning objects in the dataset into several groups, called clusters, without any prior knowledge. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most widely used clustering algorithms for spatial datasets, which can detect any shapes of clusters and can automatically identify noise points. However, there are several troublesome limitations of DBSCAN: (1) the performance of the algorithm depends on two specified parameters, epsilon and MinPts in which epsilon represents the maximum radius of a neighborhood from the observing point and MinPts means the minimum number of data points contained in such a neighborhood. (2) The time consumption for searching the nearest neighbors of each object is intolerable in the cluster expansion. (3) Selecting different starting points results in quite different consequences. (4) DBSCAN is unable to identify adjacent clusters of various densities. In addition to these restrictions about DBSCAN mentioned above, the identification of border points is often ignored. In our paper, we successfully solve the above problems. Firstly, we improve the traditional locality sensitive hashing method to implement fast query of nearest neighbors. Secondly, several definitions are redefined on the basis of the influence space of each object, which takes the nearest neighbors and the reverse nearest neighbors into account. The influence space is proved to be sensitive to local density changes to successfully reduce the amount of parameters and identify adjacent clusters of different densities. Moreover, this new relationship based on the influence space makes the insensitivity to the ordering of inputting points possible. Finally, a new concept core density reachable based on the influence space is put forward which aims to distinguish between border objects and noisy objects. Several experiments are performed which demonstrate that the performance of our proposed algorithm is better than the traditional DBSCAN algorithm and the improved algorithm IS-DBSCAN. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:9 / 22
页数:14
相关论文
共 50 条
  • [1] Efficient incremental density-based algorithm for clustering large datasets
    Bakr, Ahmad M.
    Ghanem, Nagia M.
    Ismail, Mohamed A.
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2015, 54 (04) : 1147 - 1154
  • [2] AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets
    Mai, Son T.
    Assent, Ira
    Storgaard, Martin
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1025 - 1034
  • [3] An Efficient And Scalable Density-Based Clustering Algorithm For Normalize Data
    Nidhi
    Patel, Km Archana
    [J]. 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, COMMUNICATION & CONVERGENCE, ICCC 2016, 2016, 92 : 136 - 141
  • [4] An Efficient Density-Based Algorithm for Data Clustering
    Theljani, Foued
    Laabidi, Kaouther
    Zidi, Salah
    Ksouri, Moufida
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (04)
  • [5] EFFICIENT DENSITY-BASED PARTITIONAL CLUSTERING ALGORITHM
    Alamgir, Zareen
    Naveed, Hina
    [J]. COMPUTING AND INFORMATICS, 2021, 40 (06) : 1322 - 1344
  • [6] MIDBSCAN: An Efficient Density-Based Clustering Algorithm
    Tsai, Cheng-Fa
    Sung, Chun-Yi
    [J]. SIXTH INTERNATIONAL SYMPOSIUM ON NEURAL NETWORKS (ISNN 2009), 2009, 56 : 469 - 479
  • [7] Efficient density-based clustering of complex objects
    Brecheisen, S
    Kriegel, HP
    Pfeifle, M
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 43 - 50
  • [8] ADvaNCE - Efficient and Scalable Approximate Density-Based Clustering Based on Hashing
    Li, Tianrun
    Heinis, Thomas
    Luk, Wayne
    [J]. INFORMATICA, 2017, 28 (01) : 105 - 130
  • [9] An Efficient Density-based clustering algorithm for face groping
    Pei, Shenfei
    Nie, Feiping
    Wang, Rong
    Li, Xuelong
    [J]. NEUROCOMPUTING, 2021, 462 : 331 - 343
  • [10] Scalable density-based distributed clustering
    Januzaj, E
    Kriegel, HP
    Pfeifle, M
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 231 - 244