Sparse kernel k-means for high-dimensional data

被引:2
|
作者
Guan, Xin [1 ]
Terada, Yoshikazu [1 ,2 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, 1 3 Machikaneyamacho, Toyonaka, Osaka 5600043, Japan
[2] RIKEN Ctr Adv Intelligence Project AIP, 1 4 1 Nihonbashi,Chuo ku, Tokyo 1030027, Japan
关键词
Clustering; Feature selection; Kernel method; FEATURE-SELECTION METHOD;
D O I
10.1016/j.patcog.2023.109873
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The kernel k-means method usually loses its power when clustering high-dimensional data, due to a large number of irrelevant features. We propose a novel sparse kernel k-means clustering (SKKM) to extend the advantages of kernel k-means to the high-dimensional cases. We assign each feature a 0-1 indicator and optimize an equivalent kernel k-means loss function while penalizing the sum of the indicators. An alternating minimization algorithm is proposed to estimate both the class labels and the feature indicators. We prove the consistency of both clustering and feature selection of the proposed method. In addition, we apply the proposed framework to the normalized cut. In the numerical experiments, we demonstrate that the proposed method provides better/comparable performance compared to the existing high-dimensional clustering methods.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Weighted kernel K-means for clustering spatial data
    Faculty of Computer Science and Information Systems, University Technology Malaysia, Skudai 81310 Johor, Malaysia
    [J]. WSEAS Trans. Syst, 2006, 6 (1301-1308):
  • [32] Deterministic Coresets for k-Means of Big Sparse Data
    Barger, Artem
    Feldman, Dan
    [J]. ALGORITHMS, 2020, 13 (04)
  • [33] On the anonymization of sparse high-dimensional data
    Ghinita, Gabriel
    Tao, Yufei
    Kalnis, Panos
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +
  • [34] Interpolation of sparse high-dimensional data
    Lux, Thomas C. H.
    Watson, Layne T.
    Chang, Tyler H.
    Hong, Yili
    Cameron, Kirk
    [J]. NUMERICAL ALGORITHMS, 2021, 88 (01) : 281 - 313
  • [35] Interpolation of sparse high-dimensional data
    Thomas C. H. Lux
    Layne T. Watson
    Tyler H. Chang
    Yili Hong
    Kirk Cameron
    [J]. Numerical Algorithms, 2021, 88 : 281 - 313
  • [36] Sparse Regularization in Fuzzy c-Means for High-Dimensional Data Clustering
    Chang, Xiangyu
    Wang, Qingnan
    Liu, Yuewen
    Wang, Yu
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) : 2616 - 2627
  • [37] Sparse probabilistic K-means
    Jung, Yoon Mo
    Whang, Joyce Jiyoung
    Yun, Sangwoon
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2020, 382
  • [38] Sparse Subspace K-means
    Diallo, Abdoul Wahab
    Niang, Ndeye
    Ouattara, Mory
    [J]. 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 678 - 685
  • [39] Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data
    Yang, Aijun
    Tian, Yuzhu
    Li, Yunxian
    Lin, Jinguan
    [J]. COMPUTATIONAL STATISTICS, 2020, 35 (01) : 245 - 258
  • [40] Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data
    Aijun Yang
    Yuzhu Tian
    Yunxian Li
    Jinguan Lin
    [J]. Computational Statistics, 2020, 35 : 245 - 258