A novel feature selection method for large-scale data sets

被引:1
|
作者
Chen, Wei-Chou [1 ]
Yang, Ming-Chun [1 ]
Tseng, Shian-Shyong [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 300, Taiwan
关键词
machine learning; knowledge discovery; feature selection; bitmap indexing; rough set;
D O I
10.3233/IDA-2005-9302
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is about finding useful (relevant) features to describe an application domain. The problem of finding the minimal subsets of features that can describe all of the concepts in the given data set is NP-hard. In the past, we had proposed a feature selection method, which originated from rough set and bitmap indexing techniques, to select the optimal (minimal) feature set for the given data set efficiently. Although our method is sufficient to guarantee a solution's optimality, the computation cost is very high when the number of features is huge. In this paper, we propose a nearly optimal feature selection method, called bitmap-based feature selection method with discernibility matrix, which employs a discernibility matrix to record the important features during the construction of the cleansing tree to reduce the processing time. And the corresponding indexing and selecting algorithms for such feature selection method are also proposed. Finally, some experiments and comparisons are given and the result shows the efficiency and accuracy of our proposed method.
引用
收藏
页码:237 / 251
页数:15
相关论文
共 50 条
  • [31] On the scalability of genetic algorithms to very large-scale feature selection
    Moser, A
    Murty, MN
    [J]. REAL-WORLD APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2000, 1803 : 77 - 86
  • [32] Biomarker Discovery Based on Large-Scale Feature Selection and MapReduce
    Kourid, Ahlam
    Batouche, Mohamed
    [J]. COMPUTER SCIENCE AND ITS APPLICATIONS, CIIA 2015, 2015, 456 : 81 - 92
  • [33] FEATURE EXTRACTION AND TRACKING FOR LARGE-SCALE GEOSPATIAL DATA
    Yu, Lina
    Zhu, Feiyu
    Yu, Hongfeng
    Wang, Jun
    Kuo, Kwo-Sen
    [J]. 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 1504 - 1507
  • [34] Double Nystrom Method: An Efficient and Accurate Nystrom Scheme for Large-Scale Data Sets
    Lim, Woosang
    Kim, Minhwan
    Park, Haesun
    Jung, Kyomin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1367 - 1375
  • [35] Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data
    Leclercq, Mickael
    Vittrant, Benjamin
    Martin-Magniette, Marie Laure
    Boyer, Marie Pier Scott
    Perin, Olivier
    Bergeron, Alain
    Fradet, Yves
    Droit, Arnaud
    [J]. FRONTIERS IN GENETICS, 2019, 10
  • [36] Benchmarking feature selection methods with different prediction models on large-scale healthcare event data
    Zhang F.
    Luo C.
    Lan C.
    Zhan J.
    [J]. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 2021, 1 (01):
  • [37] A novel feature selection method using fuzzy rough sets
    Sheeja, T. K.
    Kuriakose, A. Sunny
    [J]. COMPUTERS IN INDUSTRY, 2018, 97 : 111 - 116
  • [38] Redefined decision variable analysis method for large-scale optimization and its application to feature selection
    Li, Yongfeng
    Li, Lingjie
    Tang, Huimei
    Lin, Qiuzhen
    Ming, Zhong
    Leung, Victor C. M.
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2023, 82
  • [39] A Hybrid Feature Selection Method for Data Sets of thousands of Variables
    Liu, Jihong
    Wang, Guoxiong
    [J]. 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 288 - 291
  • [40] A Bayesian Sampling Method for Product Feature Extraction From Large-Scale Textual Data
    Lim, Sunghoon
    Tucker, Conrad S.
    [J]. JOURNAL OF MECHANICAL DESIGN, 2016, 138 (06)