A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method

被引:0
|
作者
Guo Xian e [1 ]
Yan Junmei [1 ]
机构
[1] Math & Comp Sci Inst, Datong, Shanxi, Peoples R China
关键词
fuzzy clustering; distributed method; genetic algorithm; fuzzy dissimilar matrix; large data sets; high dimension;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data sets are randomly divided into several subsets, then fuzzy clustering method for a A high-dimensional datas based on genetic algorithm is proposed to cluster the subsets, by importing a fuzzy dissimilar matrix to express the dissimilar degree between any two datas, and initializing the high-dimensional samples to two-dimensional plane. Then iteratively optimize the coordinate value of two-dimensional plane using genetic algorithm, which makes the Euclidean distance between the two-dimensional plane approximate to the fuzzy dissimilar degree between samples gradually. At last cluster the two-dimensional datas using FCM algorithm, so avoid dependence of clustering validity on the space distribution of high-dimensional samples. Experimental results show the method has high quality result, and improves the clustering speed greatly.
引用
收藏
页码:1 / 6
页数:6
相关论文
共 50 条
  • [11] EFFECTIVE CLUSTERING ALGORITHM FOR HIGH-DIMENSIONAL SPARSE DATA BASED ON SOM
    Martinovic, Jan
    Slaninova, Katerina
    Vojacek, Lukas
    Drazdilova, Pavla
    Dvorsky, Jiri
    Vondrak, Ivo
    [J]. NEURAL NETWORK WORLD, 2013, 23 (02) : 131 - 147
  • [12] Outlier mining in large high-dimensional data sets
    Angiulli, F
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215
  • [13] Functional clustering algorithm for high-dimensional proteomics data
    Bensmail, H
    Aruna, B
    Semmes, OJ
    Haoudi, A
    [J]. JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 80 - 86
  • [14] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
    Nourashrafeddin, S. N.
    Arnold, Dirk V.
    Milios, Evangelos
    [J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
  • [15] DACC: A Data Exploration Method for High-Dimensional Data Sets
    Zhao, Qingnan
    Li, Hui
    Chen, Mei
    Dai, Zhenyu
    Zhu, Ming
    [J]. ARTIFICIAL INTELLIGENCE AND ALGORITHMS IN INTELLIGENT SYSTEMS, 2019, 764 : 219 - 229
  • [16] AGRID: An efficient algorithm for clustering large high-dimensional datasets
    Zhao, YC
    Song, JD
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 271 - 282
  • [17] A density-based clustering algorithm for high-dimensional data with feature selection
    Qi Xianting
    Wang Pan
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
  • [18] A grid-based subspace clustering algorithm for high-dimensional data streams
    Sun, Yufen
    Lu, Yansheng
    [J]. WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 37 - 48
  • [19] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
    Mansoori, Eghbal G.
    [J]. SOFT COMPUTING, 2014, 18 (05) : 905 - 922
  • [20] GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
    Eghbal G. Mansoori
    [J]. Soft Computing, 2014, 18 : 905 - 922