The k-means forest classifier for high dimensional data

被引：0

作者：

Chen, Zizhong ^{[1
]}

Ding, Xin ^{[1
]}

Xia, Shuyin ^{[1
]}

Chen, Baiyun ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing, Peoples R China

来源：

2018 9TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK) | 2018年

基金：

中国国家自然科学基金;

关键词：

high dimensional data; attribute noise; k-means forest;

D O I：

10.1109/ICBK.2018.00050

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The priority search k-means tree algorithm is the most effective k-nearest neighbor algorithm for high dimensional data as far as we know. However, this algorithm is sensitive to attribute noise which is common in high dimensional spaces. Therefore, this paper presents a new method named k-means forest that combines the priority search k-means tree algorithm with random forest. The main idea is to create multiple priority search k-means trees by randomly selecting a fixed number of attributes to make decisions and get the final result by voting. We also design a parallel version for the algorithm. The experimental results on artificial and public benchmark data sets demonstrate the effectiveness of the proposed method.

引用

下载

页码：322 / 327

页数：6

共 50 条

[21] Soil data clustering by using K-means and fuzzy K-means algorithm
Hot, Elma
Popovic-Bugarin, Vesna
2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
[22] Distance based k-means clustering algorithm for determining number of clusters for high dimensional data
Alibuhtto, Mohamed Cassim
Mahat, Nor Idayu
DECISION SCIENCE LETTERS, 2020, 9 (01) : 51 - 58
[23] MORe plus plus : k-Means Based Outlier Removal on High-Dimensional Data
Beer, Anna
Lauterbach, Jennifer
Seidl, Thomas
SIMILARITY SEARCH AND APPLICATIONS (SISAP 2019), 2019, 11807 : 188 - 202
[24] Fast and Robust K-means Clustering via Feature Learning on High-dimensional Data
Wang, Xiao-dong
Chen, Rung-Ching
Yan, Fei
2017 IEEE 8TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2017, : 194 - 198
[25] A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data
Hussain, Syed Fawad
Haris, Muhammad
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 118 : 20 - 34
[26] Discriminative fuzzy K-means clustering with local structure preservation for high-dimensional data
Yu, Yu-Feng
Wei, Peiwen
Wu, Xiaoling
Feng, Qiying
Zhang, Chuanbin
Knowledge-Based Systems, 2024, 304
[27] An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data
Jing, Liping
Ng, Michael K.
Huang, Joshua Zhexue
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (08) : 1026 - 1041
[28] Outsourced K-means Clustering for High-Dimensional Data Analysis Based on Homomorphic Encryption*
Chang, Ray-, I
Chang, Yen-Ting
Wang, Chia-Hui
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2023, 39 (03) : 525 - 548
[29] Gene Selection for High Dimensional Data Using K-Means Clustering Algorithm and Statistical Approach
Ahmad, Farzana Kabir
Yusof, Yuhanis
Othman, Nor Hayati
2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND TECHNOLOGY (ICCST), 2014,
[30] Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
Laskar, Md Tahmid Rahman
Huang, Jimmy Xiangji
Smetana, Vladan
Stewart, Chris
Pouw, Kees
An, Aijun
Chan, Stephen
Liu, Lei
ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS, 2021, 5 (04)

← 1 2 3 4 5 →