A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method

被引：0

作者：

Guo Xian e ^{[1
]}

Yan Junmei ^{[1
]}

机构：

[1] Math & Comp Sci Inst, Datong, Shanxi, Peoples R China

来源：

PROCEEDINGS OF 2009 INTERNATIONAL WORKSHOP ON INFORMATION SECURITY AND APPLICATION | 2009年

关键词：

fuzzy clustering; distributed method; genetic algorithm; fuzzy dissimilar matrix; large data sets; high dimension;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data sets are randomly divided into several subsets, then fuzzy clustering method for a A high-dimensional datas based on genetic algorithm is proposed to cluster the subsets, by importing a fuzzy dissimilar matrix to express the dissimilar degree between any two datas, and initializing the high-dimensional samples to two-dimensional plane. Then iteratively optimize the coordinate value of two-dimensional plane using genetic algorithm, which makes the Euclidean distance between the two-dimensional plane approximate to the fuzzy dissimilar degree between samples gradually. At last cluster the two-dimensional datas using FCM algorithm, so avoid dependence of clustering validity on the space distribution of high-dimensional samples. Experimental results show the method has high quality result, and improves the clustering speed greatly.

引用

页码：1 / 6

页数：6

共 50 条

[11] EFFECTIVE CLUSTERING ALGORITHM FOR HIGH-DIMENSIONAL SPARSE DATA BASED ON SOM
Martinovic, Jan
Slaninova, Katerina
Vojacek, Lukas
Drazdilova, Pavla
Dvorsky, Jiri
Vondrak, Ivo
[J]. NEURAL NETWORK WORLD, 2013, 23 (02) : 131 - 147
[12] Outlier mining in large high-dimensional data sets
Angiulli, F
Pizzuti, C
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215
[13] Functional clustering algorithm for high-dimensional proteomics data
Bensmail, H
Aruna, B
Semmes, OJ
Haoudi, A
[J]. JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 80 - 86
[14] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
Nourashrafeddin, S. N.
Arnold, Dirk V.
Milios, Evangelos
[J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
[15] DACC: A Data Exploration Method for High-Dimensional Data Sets
Zhao, Qingnan
Li, Hui
Chen, Mei
Dai, Zhenyu
Zhu, Ming
[J]. ARTIFICIAL INTELLIGENCE AND ALGORITHMS IN INTELLIGENT SYSTEMS, 2019, 764 : 219 - 229
[16] AGRID: An efficient algorithm for clustering large high-dimensional datasets
Zhao, YC
Song, JD
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 271 - 282
[17] A density-based clustering algorithm for high-dimensional data with feature selection
Qi Xianting
Wang Pan
[J]. 2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
[18] A grid-based subspace clustering algorithm for high-dimensional data streams
Sun, Yufen
Lu, Yansheng
[J]. WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 37 - 48
[19] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
Mansoori, Eghbal G.
[J]. SOFT COMPUTING, 2014, 18 (05) : 905 - 922
[20] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
Eghbal G. Mansoori
[J]. Soft Computing, 2014, 18 : 905 - 922

← 1 2 3 4 5 →