A Framework for Feature Selection in Clustering

被引:410
|
作者
Witten, Daniela M. [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Hierarchical clustering; High-dimensional; K-means clustering; Lasso; Model selection; Sparsity; Unsupervised learning; VARIABLE SELECTION; PRINCIPAL-COMPONENTS; OBJECTS; NUMBER;
D O I
10.1198/jasa.2010.tm09415
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated and genomic data.
引用
收藏
页码:713 / 726
页数:14
相关论文
共 50 条
  • [1] A Local SVD Framework for Stable Feature Selection for Clustering
    Alelyani, Salem
    Liu, Huan
    2015 IEEE 16TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2015, : 236 - 243
  • [2] A Feature Selection Framework Based on Supervised Data Clustering
    Liu, Hongzhi
    Fu, Bin
    Jiang, Zhengshen
    Wu, Zhonghai
    Hsu, D. Frank
    2016 IEEE 15TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2016, : 316 - 321
  • [3] Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework
    Long, Zhong-Zhen
    Xu, Guoxia
    Du, Jiao
    Zhu, Hu
    Yan, Taiyu
    Yu, Yu-Feng
    BIG DATA RESEARCH, 2021, 23
  • [4] Feature selection for clustering
    Dash, M
    Liu, H
    KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS: CURRENT ISSUES AND NEW APPLICATIONS, 2000, 1805 : 110 - 121
  • [5] HMOSHSSA: a novel framework for solving simultaneous clustering and feature selection problems
    Kumar, Vijay
    Kumari, Rajani
    Kumar, Sandeep
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (35) : 82149 - 82175
  • [6] Simultaneous feature selection and symmetry based clustering using multiobjective framework
    Saha, Sriparna
    Spandana, Rachamadugu
    Ekbal, Asif
    Bandyopadhyay, Sanghamitra
    APPLIED SOFT COMPUTING, 2015, 29 : 479 - 486
  • [7] A Framework for Feature Selection in Clustering (vol 105, pg 713, 2010)
    Witten, Daniela
    Tibshirani, Robert
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (492) : 1637 - 1637
  • [8] Unsupervised Feature Selection with Feature Clustering
    Cheung, Yiu-ming
    Jia, Hong
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 9 - 15
  • [9] Feature Selection for Visual Clustering
    Alagambigai, P.
    Thangavel, K.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 498 - +
  • [10] On feature selection through clustering
    Butterworth, R
    Piatetsky-Shapiro, G
    Simovici, DA
    FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 581 - 584