Using Data Complexity Measures for Thresholding in Feature Selection Rankers

被引:12
|
作者
Seijo-Pardo, Borja [1 ]
Bolon-Canedo, Veronica [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, Campus Elvina S-N, La Coruna 15071, Spain
关键词
REDUNDANCY; RELEVANCE;
D O I
10.1007/978-3-319-44636-3_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last few years, feature selection has become essential to confront the dimensionality problem, removing irrelevant and redundant information. For this purpose, ranker methods have become an approximation commonly used since they do not compromise the computational efficiency. Ranker methods return an ordered ranking of all the features, and thus it is necessary to establish a threshold to reduce the number of features to deal with. In this work, a practical subset of features is selected according to three different data complexity measures, releasing the user from the task of choosing a fixed threshold in advance. The proposed approach was tested on six different DNA microarray datasets which have brought a difficult challenge for researchers due to the high number of gene expression and the low number of patients. The adequacy of the proposed approach in terms of classification error was checked by the use of an ensemble of ranker methods with a Support Vector Machine as classifier. This study shows that our approach was able to achieve competitive results compared with those obtained by fixed threshold approach, which is the standard in most research works.
引用
收藏
页码:121 / 131
页数:11
相关论文
共 50 条
  • [1] Data complexity measures in feature selection
    Okimoto, Lucas C.
    Lorena, Ana C.
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [2] Complexity Measures Effectiveness in Feature Selection
    Okimoto, Lucas Chesini
    Savii, Ricardo Manhaes
    Lorena, Ana Carolina
    [J]. 2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2017, : 91 - 96
  • [3] Feature selection for domain adaptation using complexity measures and swarm intelligence
    Castillo-Garcia, G.
    Moran-Fernandez, L.
    Bolon-Canedo, V.
    [J]. NEUROCOMPUTING, 2023, 548
  • [4] Dynamic selection of normalization techniques using data complexity measures
    Jain, Sukirty
    Shukla, Sanyam
    Wadhvani, Rajesh
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 106 : 252 - 262
  • [5] Revisiting Feature Selection with Data Complexity
    Ngan Thi Dong
    Khosla, Megha
    [J]. 2020 IEEE 20TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2020), 2020, : 211 - 216
  • [6] Centralized vs. distributed feature selection methods based on data complexity measures
    Moran-Fernandez, L.
    Bolon-Canedo, V.
    Alonso-Betanzos, A.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 117 : 27 - 45
  • [7] Feature Popularity Between Different Web Attacks with Supervised Feature Selection Rankers
    Zuech, Richard
    Hancock, John
    Khoshgoftaar, Taghi M.
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 30 - 37
  • [8] Instance Ranking Using Data Complexity Measures for Training Set Selection
    Alam, Junaid
    Rani, T. Sobha
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT I, 2019, 11941 : 179 - 188
  • [9] POLARIMETRIC SAR DATA FEATURE SELECTION USING MEASURES OF MUTUAL INFORMATION
    Tanase, R.
    Radoi, A.
    Datcu, M.
    Raducanu, D.
    [J]. 2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 1140 - 1143
  • [10] Fuzzy Information Measures Feature Selection Using Descriptive Statistics Data
    Salem, Omar A. M.
    Liu, Haowen
    Liu, Feng
    Chen, Yi-Ping Phoebe
    Chen, Xi
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 77 - 90