An Improved Initialization Method for Clustering High-Dimensional Data

被引：0

作者：

Zhang, Yanping ^{[1
]}

Jiang, Qingshan ^{[1
]}

机构：

[1] Xiamen Univ, Software Sch, Xiamen 361005, Fujian, Peoples R China

来源：

2010 2ND INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS PROCEEDINGS (DBTA) | 2010年

关键词：

K-Means type clustering; initialization method; distance weight coefficient; neighborhood density;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Searching initial centers in high dimensional space is an interesting and important problem which is relevant for the wide various types of K-Means algorithm. However, this is a very difficult problem, due to the "curse of dimensionality" and the inherently sparse data. Algorithm IMSND is one of the latest initialization methods that are based on the idea of sharing neighborhood density. Concerning the accuracy and the input parameters of IMSND, an optimized algorithm is presented, which employs a new density measure with distance weight coefficient to improve the search accuracy. Experimental results on real world datasets show that our algorithm outperforms other algorithms, including IMSND.

引用

页数：4

共 50 条

[41] Fuzzy nearest neighbor clustering of high-dimensional data
Wang, HB
Yu, YQ
Zhou, DR
Meng, B
[J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
[42] Accelerating high-dimensional clustering with lossless data reduction
Qaqish, Bahjat F.
O'Brien, Jonathon J.
Hibbard, Jonathan C.
Clowers, Katie J.
[J]. BIOINFORMATICS, 2017, 33 (18) : 2867 - 2872
[43] Subspace clustering of high-dimensional data: a predictive approach
McWilliams, Brian
Montana, Giovanni
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 736 - 772
[44] Ensemble Clustering for Boundary Detection in High-Dimensional Data
Anagnostou, Panagiotis
Pavlidis, Nicos G.
Tasoulis, Sotiris
[J]. MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT II, 2024, 14506 : 324 - 333
[45] Clustering high-dimensional data using growing SOM
Zhou, JL
Fu, Y
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 63 - 68
[46] Generalized projected clustering in high-dimensional data streams
Wang, T
[J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778
[47] Self-tuning clustering for high-dimensional data
Guoqiu Wen
Yonghua Zhu
Zhiguo Cai
Wei Zheng
[J]. World Wide Web, 2018, 21 : 1563 - 1573
[48] Subspace Clustering of Very Sparse High-Dimensional Data
Peng, Hankui
Pavlidis, Nicos
Eckley, Idris
Tsalamanis, Ioannis
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 3780 - 3783
[49] Model based clustering of high-dimensional binary data
Tang, Yang
Browne, Ryan P.
Mc Nicholas, Paul D.
[J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 87 : 84 - 101
[50] An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data
Yu, Hualong
Ni, Jun
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 657 - 666

← 1 2 3 4 5 →