A Cheap Feature Selection Approach for the K-Means Algorithm

被引:19
|
作者
Capo, Marco [1 ]
Perez, Aritz [1 ]
Lozano, Jose A. [1 ,2 ]
机构
[1] Basque Ctr Appl Math, Bilbao 48009, Spain
[2] Univ Basque Country, UPV EHU, Intelligent Syst Grp, Dept Comp Sci & Artificial Intelligence, San Sebastian 20018, Spain
关键词
Dimensionality reduction; K-means clustering; feature selection; parallelization; unsupervised learning; MEANS CLUSTERING-ALGORITHM;
D O I
10.1109/TNNLS.2020.3002576
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increase in the number of features that need to be analyzed in a wide variety of areas, such as genome sequencing, computer vision, or sensor networks, represents a challenge for the K-means algorithm. In this regard, different dimensionality reduction approaches for the K-means algorithm have been designed recently, leading to algorithms that have proved to generate competitive clusterings. Unfortunately, most of these techniques tend to have fairly high computational costs and/or might not be easy to parallelize. In this article, we propose a fully parallelizable feature selection technique intended for the K-means algorithm. The proposal is based on a novel feature relevance measure that is closely related to the K-means error of a given clustering. Given a disjoint partition of the features, the technique consists of obtaining a clustering for each subset of features and selecting the m features with the highest relevance measure. The computational cost of this approach is just O(m . max{n . K, log m}) per subset of features. We additionally provide a theoretical analysis on the quality of the obtained solution via our proposal and empirically analyze its performance with respect to well-known feature selection and feature extraction techniques. Such an analysis shows that our proposal consistently obtains the results with lower K-means error than all the considered feature selection techniques: Laplacian scores, maximum variance, multicluster feature selection, and random selection while also requiring similar or lower computational times than these approaches. Moreover, when compared with feature extraction techniques, such as random projections, the proposed approach also shows a noticeable improvement in both error and computational time.
引用
收藏
页码:2195 / 2208
页数:14
相关论文
共 50 条
  • [31] FEATURE SELECTION VIA INCORPORATING STIEFEL MANIFOLD IN RELAXED K-MEANS
    Cai, Guohao
    Zhang, Rui
    Nie, Feiping
    Li, Xuelong
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1503 - 1507
  • [32] Discriminatively embedded fuzzy K-Means clustering with feature selection strategy
    Zhao, Peng
    Zhang, Yongxin
    Ma, Youzhong
    Zhao, Xiaowei
    Fan, Xunli
    [J]. APPLIED INTELLIGENCE, 2023, 53 (16) : 18959 - 18970
  • [33] Discriminatively embedded fuzzy K-Means clustering with feature selection strategy
    Peng Zhao
    Yongxin Zhang
    Youzhong Ma
    Xiaowei Zhao
    Xunli Fan
    [J]. Applied Intelligence, 2023, 53 : 18959 - 18970
  • [34] A Novel Stability Based Feature Selection Framework for k-means Clustering
    Mavroeidis, Dimitrios
    Marchiori, Elena
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2011, 6912 : 421 - 436
  • [35] The Hybrid Feature Selection k-means Method for Arabic Webpage Classification
    Alghamdi, Hanan
    Selamat, Ali
    [J]. JURNAL TEKNOLOGI, 2014, 70 (05):
  • [36] Subspace clustering of text documents with feature weighting K-means algorithm
    Jing, LP
    Ng, MK
    Xu, J
    Huang, JZ
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 802 - 812
  • [37] A K-means Text Clustering Algorithm Based on Subject Feature Vector
    Duo, Ji
    Zhang, Peng
    Hao, Liu
    [J]. JOURNAL OF WEB ENGINEERING, 2021, 20 (06): : 1935 - 1946
  • [38] K-Means algorithm based on multi-feature-induced order
    Wan, Benting
    Huang, Weikang
    Pierre, Bilivogui
    Cheng, Youyu
    Zhou, Shufen
    [J]. GRANULAR COMPUTING, 2024, 9 (02)
  • [39] Modifying Genetic Algorithm with Species and Sexual Selection by using K-means Algorithm
    Patel, Rahila
    Raghuwanshi, M. M.
    Jaiswal, Anil N.
    [J]. 2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 114 - +
  • [40] Gene Selection for High Dimensional Data Using K-Means Clustering Algorithm and Statistical Approach
    Ahmad, Farzana Kabir
    Yusof, Yuhanis
    Othman, Nor Hayati
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND TECHNOLOGY (ICCST), 2014,