PFHC: A clustering algorithm based on data partitioning for unevenly distributed datasets

被引:3
|
作者
Dong, Yihong [1 ]
Cao, Shaoka
Chen, Ken [2 ]
He, Maoshun
Tai, Xiaoying
机构
[1] Ningbo Univ, Dept Comp Sci, Coll Informat Sci & Engn, Inst Comp Sci & Technol, Ningbo 315211, Zhejiang, Peoples R China
[2] Ningbo Univ, Inst Circuit & Syst, Ningbo 315211, Zhejiang, Peoples R China
关键词
Data mining; Fuzzy clustering; Unevenly distributed dataset; Data partitioning;
D O I
10.1016/j.fss.2008.11.012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently many researchers exert their effort on clustering as a primary data mining method for knowledge discovery, but only few of them have focused on uneven dataset. In the last research, we proposed an efficient hierarchical algorithm based on fuzzy graph connectedness-FHC-to discover clusters with arbitrary shapes. In this paper, we present a novel clustering algorithm for uneven dataset-PFHC-which is an extended version based on FHC. In PFHC, dataset is divided into several local spaces firstly according to the data density of distribution, where the data density in any local space is nearly uniform. In order to achieve the goal, local E and;. are used in each local domain to acquire local clustering result by FHC. Then boundary between local areas needs being taken into consideration for combination. Finally local clusters need to be merged to obtain global clusters. As an extension of FHC, PFHC can deal with uneven datasets more effectively and efficiently, and generate better quality clusters than other methods as experiment shows. Furthermore, PFHC is found to be able to process incremental data as well in this work. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:1886 / 1901
页数:16
相关论文
共 50 条
  • [41] Parallel Kriging Algorithm for Unevenly Spaced Data
    Strzelczyk, Jacek
    Porzycka, Stanislawa
    [J]. APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT I, 2012, 7133 : 204 - 212
  • [42] Integrating multi-objective genetic algorithm based clustering and data partitioning for skyline computation
    Ozyer, Tansel
    Zhang, Ming
    Alhajj, Reda
    [J]. APPLIED INTELLIGENCE, 2011, 35 (01) : 110 - 122
  • [43] Much different parallel construction density tree clustering (PCDTC) algorithm based on data partitioning
    Zhang, Yunpeng
    Zhang, Lu
    Zhai, Zhengjun
    Ma, Chunyan
    Dai, Weidi
    [J]. Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2008, 26 (04): : 524 - 529
  • [44] Integrating multi-objective genetic algorithm based clustering and data partitioning for skyline computation
    Tansel Özyer
    Ming Zhang
    Reda Alhajj
    [J]. Applied Intelligence, 2011, 35 : 110 - 122
  • [45] A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark
    Hosseini, Behrooz
    Kiani, Kourosh
    [J]. SYMMETRY-BASEL, 2018, 10 (08):
  • [46] Efficient Large Scale Clustering based on Data Partitioning
    Bendechache, Malika
    Le-Khac, Nhien-An
    Kechadi, M-Tahar
    [J]. PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 612 - 621
  • [47] A new clustering algorithm for large datasets
    Li Qing-feng
    Peng Wen-feng
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY OF TECHNOLOGY, 2011, 18 (03): : 823 - 829
  • [48] A min-max cut algorithm for graph partitioning and data clustering
    Ding, CHQ
    He, XF
    Zha, HY
    Gu, M
    Simon, HD
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 107 - 114
  • [49] The intrusion data mining method for distributed network based on fuzzy kernel clustering algorithm
    Li, Linlin
    [J]. INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2022, 15 (01) : 32 - 45
  • [50] A min-max cult algorithm for graph partitioning and data clustering
    Ding, Chris H. Q.
    He, Xiaofeng
    Zha, Hongyuan
    Gu, Ming
    Simon, Horst D.
    [J]. Proceedings - IEEE International Conference on Data Mining, ICDM, 2001, : 107 - 114