An Improved Initialization Method for Clustering High-Dimensional Data

被引:0
|
作者
Zhang, Yanping [1 ]
Jiang, Qingshan [1 ]
机构
[1] Xiamen Univ, Software Sch, Xiamen 361005, Fujian, Peoples R China
关键词
K-Means type clustering; initialization method; distance weight coefficient; neighborhood density;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Searching initial centers in high dimensional space is an interesting and important problem which is relevant for the wide various types of K-Means algorithm. However, this is a very difficult problem, due to the "curse of dimensionality" and the inherently sparse data. Algorithm IMSND is one of the latest initialization methods that are based on the idea of sharing neighborhood density. Concerning the accuracy and the input parameters of IMSND, an optimized algorithm is presented, which employs a new density measure with distance weight coefficient to improve the search accuracy. Experimental results on real world datasets show that our algorithm outperforms other algorithms, including IMSND.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Fuzzy nearest neighbor clustering of high-dimensional data
    Wang, HB
    Yu, YQ
    Zhou, DR
    Meng, B
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
  • [42] Accelerating high-dimensional clustering with lossless data reduction
    Qaqish, Bahjat F.
    O'Brien, Jonathon J.
    Hibbard, Jonathan C.
    Clowers, Katie J.
    [J]. BIOINFORMATICS, 2017, 33 (18) : 2867 - 2872
  • [43] Subspace clustering of high-dimensional data: a predictive approach
    McWilliams, Brian
    Montana, Giovanni
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 736 - 772
  • [44] Ensemble Clustering for Boundary Detection in High-Dimensional Data
    Anagnostou, Panagiotis
    Pavlidis, Nicos G.
    Tasoulis, Sotiris
    [J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT II, 2024, 14506 : 324 - 333
  • [45] Clustering high-dimensional data using growing SOM
    Zhou, JL
    Fu, Y
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 63 - 68
  • [46] Generalized projected clustering in high-dimensional data streams
    Wang, T
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778
  • [47] Self-tuning clustering for high-dimensional data
    Guoqiu Wen
    Yonghua Zhu
    Zhiguo Cai
    Wei Zheng
    [J]. World Wide Web, 2018, 21 : 1563 - 1573
  • [48] Subspace Clustering of Very Sparse High-Dimensional Data
    Peng, Hankui
    Pavlidis, Nicos
    Eckley, Idris
    Tsalamanis, Ioannis
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
  • [49] Model based clustering of high-dimensional binary data
    Tang, Yang
    Browne, Ryan P.
    Mc Nicholas, Paul D.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 87 : 84 - 101
  • [50] An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data
    Yu, Hualong
    Ni, Jun
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 657 - 666