Sparse kernel k-means for high-dimensional data

被引：2

作者：

Guan, Xin ^{[1
]}

Terada, Yoshikazu ^{[1
,2
]}

机构：

[1] Osaka Univ, Grad Sch Engn Sci, 1 3 Machikaneyamacho, Toyonaka, Osaka 5600043, Japan

[2] RIKEN Ctr Adv Intelligence Project AIP, 1 4 1 Nihonbashi,Chuo ku, Tokyo 1030027, Japan

来源：

PATTERN RECOGNITION | 2023年 / 144卷

关键词：

Clustering; Feature selection; Kernel method; FEATURE-SELECTION METHOD;

D O I：

10.1016/j.patcog.2023.109873

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The kernel k-means method usually loses its power when clustering high-dimensional data, due to a large number of irrelevant features. We propose a novel sparse kernel k-means clustering (SKKM) to extend the advantages of kernel k-means to the high-dimensional cases. We assign each feature a 0-1 indicator and optimize an equivalent kernel k-means loss function while penalizing the sum of the indicators. An alternating minimization algorithm is proposed to estimate both the class labels and the feature indicators. We prove the consistency of both clustering and feature selection of the proposed method. In addition, we apply the proposed framework to the normalized cut. In the numerical experiments, we demonstrate that the proposed method provides better/comparable performance compared to the existing high-dimensional clustering methods.

引用

页数：11

共 50 条

[31] Weighted kernel K-means for clustering spatial data
Faculty of Computer Science and Information Systems, University Technology Malaysia, Skudai 81310 Johor, Malaysia
[J]. WSEAS Trans. Syst, 2006, 6 (1301-1308):
[32] Deterministic Coresets for k-Means of Big Sparse Data
Barger, Artem
Feldman, Dan
[J]. ALGORITHMS, 2020, 13 (04)
[33] On the anonymization of sparse high-dimensional data
Ghinita, Gabriel
Tao, Yufei
Kalnis, Panos
[J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +
[34] Interpolation of sparse high-dimensional data
Lux, Thomas C. H.
Watson, Layne T.
Chang, Tyler H.
Hong, Yili
Cameron, Kirk
[J]. NUMERICAL ALGORITHMS, 2021, 88 (01) : 281 - 313
[35] Interpolation of sparse high-dimensional data
Thomas C. H. Lux
Layne T. Watson
Tyler H. Chang
Yili Hong
Kirk Cameron
[J]. Numerical Algorithms, 2021, 88 : 281 - 313
[36] Sparse Regularization in Fuzzy c-Means for High-Dimensional Data Clustering
Chang, Xiangyu
Wang, Qingnan
Liu, Yuewen
Wang, Yu
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) : 2616 - 2627
[37] Sparse probabilistic K-means
Jung, Yoon Mo
Whang, Joyce Jiyoung
Yun, Sangwoon
[J]. APPLIED MATHEMATICS AND COMPUTATION, 2020, 382
[38] Sparse Subspace K-means
Diallo, Abdoul Wahab
Niang, Ndeye
Ouattara, Mory
[J]. 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 678 - 685
[39] Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data
Yang, Aijun
Tian, Yuzhu
Li, Yunxian
Lin, Jinguan
[J]. COMPUTATIONAL STATISTICS, 2020, 35 (01) : 245 - 258
[40] Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data
Aijun Yang
Yuzhu Tian
Yunxian Li
Jinguan Lin
[J]. Computational Statistics, 2020, 35 : 245 - 258

← 1 2 3 4 5 →