A Framework for Feature Selection in Clustering

被引:410
|
作者
Witten, Daniela M. [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Hierarchical clustering; High-dimensional; K-means clustering; Lasso; Model selection; Sparsity; Unsupervised learning; VARIABLE SELECTION; PRINCIPAL-COMPONENTS; OBJECTS; NUMBER;
D O I
10.1198/jasa.2010.tm09415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated and genomic data.
引用
收藏
页码:713 / 726
页数:14
相关论文
共 50 条
  • [41] A clustering-based feature selection via feature separability
    Jiang, Shengyi
    Wang, Lianxi
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2016, 31 (02) : 927 - 937
  • [42] Feature selection for genomic data sets through feature clustering
    Zheng, Fengbin
    Shen, Xiajiong
    Fu, Zhengye
    Zheng, Shanshan
    Li, Guangrong
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (02) : 228 - 240
  • [43] Feature clustering-Assisted feature selection with differential evolution
    Wang, Peng
    Xue, Bing
    Liang, Jing
    Zhang, Mengjie
    PATTERN RECOGNITION, 2023, 140
  • [44] An efficient unsupervised feature selection procedure through feature clustering
    Yan, Xuyang
    Nazmi, Shabnam
    Erol, Berat A.
    Homaifar, Abdollah
    Gebru, Biniam
    Tunstel, Edward
    PATTERN RECOGNITION LETTERS, 2020, 131 : 277 - 284
  • [45] Feature ranking based consensus clustering for feature subset selection
    Rani, D. Sandhya
    Rani, T. Sobha
    Bhavani, S. Durga
    Krishna, G. Bala
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8154 - 8169
  • [46] An automated parameter selection approach for simultaneous clustering and feature selection
    Kumar, Vijay
    Chhabra, Jitender K.
    Kumar, Dinesh
    JOURNAL OF ENGINEERING RESEARCH, 2016, 4 (02): : 65 - 85
  • [47] A Framework for Feature Selection to Exploit Feature Group Structures
    Perera, Kushani
    Chan, Jeffrey
    Karunasekera, Shanika
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 792 - 804
  • [48] A New Feature Selection Method for Text Clustering
    XU Junling1
    2. State Key Laboratory of Software Engineering
    3. Department of Computer Science and Engineering
    Wuhan University Journal of Natural Sciences, 2007, (05) : 912 - 916
  • [49] Feature Subset Selection Using Consensus Clustering
    Rani, D. Sandhya
    Rani, T. Sobha
    Bhavani, S. Durga
    2015 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION (ICAPR), 2015, : 57 - +
  • [50] Subspace clustering guided unsupervised feature selection
    Zhu, Pengfei
    Zhu, Wencheng
    Hu, Qinghua
    Zhang, Changqing
    Zuo, Wangmeng
    PATTERN RECOGNITION, 2017, 66 : 364 - 374