PFHC: A clustering algorithm based on data partitioning for unevenly distributed datasets

被引:3
|
作者
Dong, Yihong [1 ]
Cao, Shaoka
Chen, Ken [2 ]
He, Maoshun
Tai, Xiaoying
机构
[1] Ningbo Univ, Dept Comp Sci, Coll Informat Sci & Engn, Inst Comp Sci & Technol, Ningbo 315211, Zhejiang, Peoples R China
[2] Ningbo Univ, Inst Circuit & Syst, Ningbo 315211, Zhejiang, Peoples R China
关键词
Data mining; Fuzzy clustering; Unevenly distributed dataset; Data partitioning;
D O I
10.1016/j.fss.2008.11.012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently many researchers exert their effort on clustering as a primary data mining method for knowledge discovery, but only few of them have focused on uneven dataset. In the last research, we proposed an efficient hierarchical algorithm based on fuzzy graph connectedness-FHC-to discover clusters with arbitrary shapes. In this paper, we present a novel clustering algorithm for uneven dataset-PFHC-which is an extended version based on FHC. In PFHC, dataset is divided into several local spaces firstly according to the data density of distribution, where the data density in any local space is nearly uniform. In order to achieve the goal, local E and;. are used in each local domain to acquire local clustering result by FHC. Then boundary between local areas needs being taken into consideration for combination. Finally local clusters need to be merged to obtain global clusters. As an extension of FHC, PFHC can deal with uneven datasets more effectively and efficiently, and generate better quality clusters than other methods as experiment shows. Furthermore, PFHC is found to be able to process incremental data as well in this work. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:1886 / 1901
页数:16
相关论文
共 50 条
  • [1] A Density Clustering Algorithm Based on Data Partitioning
    Li, Dongping
    [J]. PROCEEDINGS OF ANNUAL CONFERENCE OF CHINA INSTITUTE OF COMMUNICATIONS, 2010, : 251 - 254
  • [2] Distributed Clustering via LSH Based Data Partitioning
    Bhaskara, Aditya
    Wijewardena, Maheshakya
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [3] Clustering Data in Secured, Distributed Datasets
    Dey, Sayantan
    Carraher, Lee A.
    Moitra, Anindya
    Wilsey, Philip A.
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2019, PT VI: 19TH INTERNATIONAL CONFERENCE, SAINT PETERSBURG, RUSSIA, JULY 14, 2019, PROCEEDINGS, PART VI, 2019, 11624 : 557 - 572
  • [4] A Density-Grid Based Clustering Algorithm on Data Stream Using Resilient Distributed Datasets
    Zhang, Yuan
    Zhang, Jiongmin
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2016, 2016, 9673 : 316 - 322
  • [5] DHC: A Distributed Hierarchical Clustering Algorithm for Large Datasets
    Zhang, Wei
    Zhang, Gongxuan
    Chen, Xiaohui
    Liu, Yueqi
    Zhou, Xiumin
    Zhou, Junlong
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2019, 28 (04)
  • [6] An efficient density-based clustering algorithm for vertically partitioned distributed datasets
    Department of Computer Science and Engineering, Southeastern University, Nanjing 210096, China
    不详
    [J]. Jisuanji Yanjiu yu Fazhan, 2007, 9 (1612-1617):
  • [7] A Novel Scheduling Algorithm based on Clustering Analysis and Data Partitioning For Big Data
    Cui, Weiqi
    Liu, Nan
    Dong, Yihuan
    Li, Jiaqi
    Zhang, Qingchen
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER, NETWORKS AND COMMUNICATION ENGINEERING (ICCNCE 2013), 2013, 30 : 549 - 551
  • [8] A Framework for Data Clustering of Large Datasets in a Distributed Environment
    Swapna, Ch. Swetha
    Kumar, V. Vijaya
    Murthy, J. V. R.
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 425 - 441
  • [9] Sampling based approximate spectral clustering ensemble for partitioning datasets
    Moazzen, Yaser
    Tasdemir, Kadim
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1630 - 1635