The k-means forest classifier for high dimensional data

被引：0

作者：

Chen, Zizhong ^{[1
]}

Ding, Xin ^{[1
]}

Xia, Shuyin ^{[1
]}

Chen, Baiyun ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing, Peoples R China

来源：

2018 9TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK) | 2018年

基金：

中国国家自然科学基金;

关键词：

high dimensional data; attribute noise; k-means forest;

D O I：

10.1109/ICBK.2018.00050

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The priority search k-means tree algorithm is the most effective k-nearest neighbor algorithm for high dimensional data as far as we know. However, this algorithm is sensitive to attribute noise which is common in high dimensional spaces. Therefore, this paper presents a new method named k-means forest that combines the priority search k-means tree algorithm with random forest. The main idea is to create multiple priority search k-means trees by randomly selecting a fixed number of attributes to make decisions and get the final result by voting. We also design a parallel version for the algorithm. The experimental results on artificial and public benchmark data sets demonstrate the effectiveness of the proposed method.

引用

下载

页码：322 / 327

页数：6

共 50 条

[1] A Parallel K-means Algorithm for High Dimensional Text Data
Shan, Xiaolei
Shen, Yanming
Wang, Yuxin
2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2018,
[2] Sparse kernel k-means for high-dimensional data
Guan, Xin
Terada, Yoshikazu
PATTERN RECOGNITION, 2023, 144
[3] Solving k-means on High-Dimensional Big Data
Kappmeier, Jan-Philipp W.
Schmidt, Daniel R.
Schmidt, Melanie
EXPERIMENTAL ALGORITHMS, SEA 2015, 2015, 9125 : 259 - 270
[4] An AdaBoost Method with K'K-Means Bayes Classifier for Imbalanced Data
Zhang, Yanfeng
Wang, Lichun
MATHEMATICS, 2023, 11 (08)
[5] Robust and sparse k-means clustering for high-dimensional data
Šárka Brodinová
Peter Filzmoser
Thomas Ortner
Christian Breiteneder
Maia Rohm
Advances in Data Analysis and Classification, 2019, 13 : 905 - 932
[6] Robust and sparse k-means clustering for high-dimensional data
Brodinova, Sarka
Filzmoser, Peter
Ortner, Thomas
Breiteneder, Christian
Rohm, Maia
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (04) : 905 - 932
[7] Outlier Robust Geodesic K-means Algorithm for High Dimensional Data
Hassanzadeh, Aidin
Kaarna, Arto
Kauranne, Tuomo
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2016, 2016, 10029 : 252 - 262
[8] A Novel K-Means Based Clustering Algorithm for High Dimensional Data Sets
Khalilian, Madjid
Mustapha, Norwati
Suliman, Nasir
Mamat, Ali
INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 503 - +
[9] Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data
Wang, Xiao-Dong
Chen, Rung-Ching
Yan, Fei
Zeng, Zhi-Qiang
Hong, Chao-Qun
IEEE ACCESS, 2019, 7 : 42639 - 42651
[10] An investigation of K-means clustering to high and multi-dimensional biological data
Baridam, Barilee B.
Ali, M. Montaz
KYBERNETES, 2013, 42 (04) : 614 - 627

← 1 2 3 4 5 →