A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering

被引:19
|
作者
Li, Guangrong [1 ,2 ]
Hu, Xiaohua [3 ]
Shen, Xiajiong [4 ]
Chen, Xin [3 ]
Li, Zhoujun [5 ]
机构
[1] Wuhan Univ, Sch Comp, Wuhan, Peoples R China
[2] Hunan Univ, Coll Accounting, Changsha, Peoples R China
[3] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA
[4] Henan Univ, Coll Comp & Informat Engn, Henan, Peoples R China
[5] Beihang Univ, Dept Comp Sci, Beijing, Peoples R China
关键词
D O I
10.1109/GRC.2008.4664788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many feature selection methods have been proposed and most of them are in the supervised learning paradigm. Recently unsupervised feature selection has attracted a lot of attention especially in bioinformatics and text mining. So far, supervised feature selection and unsupervised feature selection method are studied and developed separately. A subset selected by a supervised feature selection method may not be a good one for unsupervised learning and vice verse. In bioinformatics research, however it is very common to perform clustering and classification iteratively for the same data sets, especially in gene expression analysis, thus it is very desirable to have a feature selection method which works well for both unsupervised learning and supervised learning. In this paper we propose a novel feature selection algorithm through feature clustering. Our algorithm does not need the class label information in the data set and is suitable for both supervised learning and unsupervised learning. Our algorithm groups the features into different clusters based on feature similarity, so that the features in the same clusters are similar to each other. A representative feature is selected from each cluster, thus reduces the feature redundancy. Our feature selection algorithm uses feature similarity for feature redundancy reduction. but requires no feature search, works very well for high dimensional data set. We test our algorithm on some biological data sets for both clustering and classification analysis and the results indicates that our FSFC algorithm can significantly reduce the original data sets without scarifying the quality of clustering and classification.
引用
收藏
页码:41 / +
页数:2
相关论文
共 50 条
  • [1] Feature selection for genomic data sets through feature clustering
    Zheng, Fengbin
    Shen, Xiajiong
    Fu, Zhengye
    Zheng, Shanshan
    Li, Guangrong
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (02) : 228 - 240
  • [2] An efficient unsupervised feature selection procedure through feature clustering
    Yan, Xuyang
    Nazmi, Shabnam
    Erol, Berat A.
    Homaifar, Abdollah
    Gebru, Biniam
    Tunstel, Edward
    PATTERN RECOGNITION LETTERS, 2020, 131 : 277 - 284
  • [3] Unsupervised Feature Selection with Feature Clustering
    Cheung, Yiu-ming
    Jia, Hong
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 9 - 15
  • [4] Unsupervised feature selection for large data sets
    de Amorim, Renato Cordeiro
    PATTERN RECOGNITION LETTERS, 2019, 128 : 183 - 189
  • [5] A Novel Method Incorporating Gene Ontology Information for Unsupervised Clustering and Feature Selection
    Srivastava, Shireesh
    Zhang, Linxia
    Jin, Rong
    Chan, Christina
    PLOS ONE, 2008, 3 (12):
  • [6] Unsupervised feature selection for balanced clustering
    Zhou, Peng
    Chen, Jiangyong
    Fan, Mingyu
    Du, Liang
    Shen, Yi-Dong
    Li, Xuejun
    KNOWLEDGE-BASED SYSTEMS, 2020, 193
  • [7] Unsupervised Feature Selection through Fitness Proportionate Sharing Clustering
    Yan, Xuyang
    Homaifar, Abdollah
    Awogbami, Gabriel
    Girma, Abenezer
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1355 - 1360
  • [8] A novel feature selection method for large-scale data sets
    Chen, Wei-Chou
    Yang, Ming-Chun
    Tseng, Shian-Shyong
    INTELLIGENT DATA ANALYSIS, 2005, 9 (03) : 237 - 251
  • [9] An Unsupervised Attribute Clustering Algorithm for Unsupervised Feature Selection
    Zhou, Pei-Yuan
    Chan, Keith C. C.
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 710 - 716
  • [10] An Online Unsupervised Streaming Features Selection Through Dynamic Feature Clustering
    Yan X.
    Homaifar A.
    Sarkar M.
    Lartey B.
    Gupta K.D.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (05): : 1281 - 1292