The k-means forest classifier for high dimensional data

被引:0
|
作者
Chen, Zizhong [1 ]
Ding, Xin [1 ]
Xia, Shuyin [1 ]
Chen, Baiyun [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
high dimensional data; attribute noise; k-means forest;
D O I
10.1109/ICBK.2018.00050
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The priority search k-means tree algorithm is the most effective k-nearest neighbor algorithm for high dimensional data as far as we know. However, this algorithm is sensitive to attribute noise which is common in high dimensional spaces. Therefore, this paper presents a new method named k-means forest that combines the priority search k-means tree algorithm with random forest. The main idea is to create multiple priority search k-means trees by randomly selecting a fixed number of attributes to make decisions and get the final result by voting. We also design a parallel version for the algorithm. The experimental results on artificial and public benchmark data sets demonstrate the effectiveness of the proposed method.
引用
下载
收藏
页码:322 / 327
页数:6
相关论文
共 50 条
  • [21] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [22] Distance based k-means clustering algorithm for determining number of clusters for high dimensional data
    Alibuhtto, Mohamed Cassim
    Mahat, Nor Idayu
    DECISION SCIENCE LETTERS, 2020, 9 (01) : 51 - 58
  • [23] MORe plus plus : k-Means Based Outlier Removal on High-Dimensional Data
    Beer, Anna
    Lauterbach, Jennifer
    Seidl, Thomas
    SIMILARITY SEARCH AND APPLICATIONS (SISAP 2019), 2019, 11807 : 188 - 202
  • [24] Fast and Robust K-means Clustering via Feature Learning on High-dimensional Data
    Wang, Xiao-dong
    Chen, Rung-Ching
    Yan, Fei
    2017 IEEE 8TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2017, : 194 - 198
  • [25] A k-means based co-clustering (kCC) algorithm for sparse, high dimensional data
    Hussain, Syed Fawad
    Haris, Muhammad
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 118 : 20 - 34
  • [26] Discriminative fuzzy K-means clustering with local structure preservation for high-dimensional data
    Yu, Yu-Feng
    Wei, Peiwen
    Wu, Xiaoling
    Feng, Qiying
    Zhang, Chuanbin
    Knowledge-Based Systems, 2024, 304
  • [27] An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data
    Jing, Liping
    Ng, Michael K.
    Huang, Joshua Zhexue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (08) : 1026 - 1041
  • [28] Outsourced K-means Clustering for High-Dimensional Data Analysis Based on Homomorphic Encryption*
    Chang, Ray-, I
    Chang, Yen-Ting
    Wang, Chia-Hui
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2023, 39 (03) : 525 - 548
  • [29] Gene Selection for High Dimensional Data Using K-Means Clustering Algorithm and Statistical Approach
    Ahmad, Farzana Kabir
    Yusof, Yuhanis
    Othman, Nor Hayati
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND TECHNOLOGY (ICCST), 2014,
  • [30] Extending Isolation Forest for Anomaly Detection in Big Data via K-Means
    Laskar, Md Tahmid Rahman
    Huang, Jimmy Xiangji
    Smetana, Vladan
    Stewart, Chris
    Pouw, Kees
    An, Aijun
    Chan, Stephen
    Liu, Lei
    ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS, 2021, 5 (04)