A novel feature selection method for large-scale data sets

被引:1
|
作者
Chen, Wei-Chou [1 ]
Yang, Ming-Chun [1 ]
Tseng, Shian-Shyong [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 300, Taiwan
关键词
machine learning; knowledge discovery; feature selection; bitmap indexing; rough set;
D O I
10.3233/IDA-2005-9302
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is about finding useful (relevant) features to describe an application domain. The problem of finding the minimal subsets of features that can describe all of the concepts in the given data set is NP-hard. In the past, we had proposed a feature selection method, which originated from rough set and bitmap indexing techniques, to select the optimal (minimal) feature set for the given data set efficiently. Although our method is sufficient to guarantee a solution's optimality, the computation cost is very high when the number of features is huge. In this paper, we propose a nearly optimal feature selection method, called bitmap-based feature selection method with discernibility matrix, which employs a discernibility matrix to record the important features during the construction of the cleansing tree to reduce the processing time. And the corresponding indexing and selecting algorithms for such feature selection method are also proposed. Finally, some experiments and comparisons are given and the result shows the efficiency and accuracy of our proposed method.
引用
收藏
页码:237 / 251
页数:15
相关论文
共 50 条
  • [1] Feature selection for large-scale data sets in GrC
    Liang, Jiye
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 2 - 7
  • [2] Local Feature Selection for Large-Scale Data Sets With Limited Labels
    Yang, Tian
    Deng, Yanfang
    Yu, Bin
    Qian, Yuhua
    Dai, Jianhua
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7152 - 7163
  • [3] Feature selection with partition differentiation entropy for large-scale data sets
    Li, Fachao
    Zhang, Zan
    Jin, Chenxia
    [J]. INFORMATION SCIENCES, 2016, 329 : 690 - 700
  • [4] Q-Learning with Fisher Score for Feature Selection of Large-Scale Data Sets
    Gan, Min
    Zhang, Li
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 306 - 318
  • [5] Greedy column subset selection for large-scale data sets
    Farahat, Ahmed K.
    Elgohary, Ahmed
    Ghodsi, Ali
    Kamel, Mohamed S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 45 (01) : 1 - 34
  • [6] Greedy column subset selection for large-scale data sets
    Ahmed K. Farahat
    Ahmed Elgohary
    Ali Ghodsi
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2015, 45 : 1 - 34
  • [7] A Large-Scale Filter Method for Feature Selection Based on Spark
    Marone, Reine Marie
    Camara, Fode
    Ndiaye, Samba
    [J]. 2017 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2017, : 16 - 20
  • [8] Data-driven adaptive and stable feature selection method for large-scale industrial systems
    Zhu, Xiuli
    Song, Yan
    Wang, Peng
    Li, Ling
    Fu, Zixuan
    [J]. Control Engineering Practice, 2024, 153
  • [9] Computing the Schulze Method for Large-Scale Preference Data Sets
    Csar, Theresa
    Lackner, Martin
    Pichler, Reinhard
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 180 - 187
  • [10] Unsupervised feature selection for large data sets
    de Amorim, Renato Cordeiro
    [J]. PATTERN RECOGNITION LETTERS, 2019, 128 : 183 - 189