A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method

被引：0

作者：

Guo Xian e ^{[1
]}

Yan Junmei ^{[1
]}

机构：

[1] Math & Comp Sci Inst, Datong, Shanxi, Peoples R China

来源：

PROCEEDINGS OF 2009 INTERNATIONAL WORKSHOP ON INFORMATION SECURITY AND APPLICATION | 2009年

关键词：

fuzzy clustering; distributed method; genetic algorithm; fuzzy dissimilar matrix; large data sets; high dimension;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data sets are randomly divided into several subsets, then fuzzy clustering method for a A high-dimensional datas based on genetic algorithm is proposed to cluster the subsets, by importing a fuzzy dissimilar matrix to express the dissimilar degree between any two datas, and initializing the high-dimensional samples to two-dimensional plane. Then iteratively optimize the coordinate value of two-dimensional plane using genetic algorithm, which makes the Euclidean distance between the two-dimensional plane approximate to the fuzzy dissimilar degree between samples gradually. At last cluster the two-dimensional datas using FCM algorithm, so avoid dependence of clustering validity on the space distribution of high-dimensional samples. Experimental results show the method has high quality result, and improves the clustering speed greatly.

引用

页码：1 / 6

页数：6

共 50 条

[1] Clustering algorithm of high-dimensional data based on units
School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
[J]. Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
[2] Approximated clustering of distributed high-dimensional data
Kriegel, HP
Kunath, P
Pfeifle, M
Renz, M
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 432 - 441
[3] Scalable clustering for large high-dimensional data based on data summarization
Lai, Ying
Orlandic, Ratko
Yee, Wai Gen
Kulkarni, Sachin
[J]. 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 456 - 461
[4] Persistent homology based clustering algorithm for high-dimensional data
Xiong Z.
Wei Y.
Xiong Z.
He K.
[J]. Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02): : 29 - 35
[5] An efficient cell-based clustering method for handling large, high-dimensional data
Chang, JW
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 295 - 300
[6] An algorithm for high-dimensional traffic data clustering
Zheng, Pengjun
McDonald, Mike
[J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 59 - 68
[7] Sparse Kernel Clustering of Massive High-Dimensional Data sets with Large Number of Clusters
Chitta, Radha
Jain, Anil K.
Jin, Rong
[J]. PIKM'15: PROCEEDINGS OF THE 8TH PH.D. WORKSHOP IN INFORMATION AND KNOWLEDGE MANAGEMENT, 2015, : 11 - 18
[8] An Initialization Method for Clustering High-Dimensional Data
Chen, Luying
Chen, Lifei
Jiang, Qingshan
Wang, Beizhan
Shi, Liang
[J]. FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 444 - +
[9] Effective clustering algorithm for high-dimensional sparse data based on SOM
[J]. 2013, Institute of Computer Science Izhevsk (23)
[10] A grid-based clustering algorithm for high-dimensional data streams
Lu, YS
Sun, YF
Xu, GP
Liu, G
[J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 824 - 831

← 1 2 3 4 5 →