Data complexity measures in feature selection

被引:0
|
作者
Okimoto, Lucas C. [1 ]
Lorena, Ana C. [2 ]
机构
[1] Univ Fed Sao Paulo UNIFESP, Inst Ciencia & Tecnol ICT, Sao Jose Dos Campos, SP, Brazil
[2] Inst Tecnol Aeronaut ITA, Div Ciencia Comp IEC, Sao Jose Dos Campos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Machine Learning; feature selection; data complexity; EFFICIENT FEATURE-SELECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection (FS) is a pre-processing step often mandatory in data analysis by Machine Learning techniques. Its objective is to reduce data dimensionality by identifying and maintaining only the relevant features from a dataset. In this work we evaluate the use of complexity measures of classification problems in FS. These descriptors allow estimating the intrinsic difficulty of a classification problem by regarding on characteristics of the dataset available for learning. We propose a combined univariate-multivariate FS technique which employs two complexity measures: Fisher's maximum discriminant ratio and sum of intra-extra class distances. The results reveal that the complexity measures are indeed suitable for estimating feature importance in classification datasets. Large reductions in the numbers of features were obtained, while preserving, in general, the predictive accuracy of two strong classification techniques: Support Vector Machines and Random Forests.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Using Data Complexity Measures for Thresholding in Feature Selection Rankers
    Seijo-Pardo, Borja
    Bolon-Canedo, Veronica
    Alonso-Betanzos, Amparo
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016, 2016, 9868 : 121 - 131
  • [2] Complexity Measures Effectiveness in Feature Selection
    Okimoto, Lucas Chesini
    Savii, Ricardo Manhaes
    Lorena, Ana Carolina
    [J]. 2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2017, : 91 - 96
  • [3] Revisiting Feature Selection with Data Complexity
    Ngan Thi Dong
    Khosla, Megha
    [J]. 2020 IEEE 20TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2020), 2020, : 211 - 216
  • [4] Centralized vs. distributed feature selection methods based on data complexity measures
    Moran-Fernandez, L.
    Bolon-Canedo, V.
    Alonso-Betanzos, A.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 117 : 27 - 45
  • [5] Classifier selection based on data complexity measures
    Hernández-Reyes, E
    Carrasco-Ochoa, JA
    Martínez-Trinidad, JF
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2005, 3773 : 586 - 592
  • [6] Parameterized Complexity of Feature Selection for Categorical Data Clustering
    Bandyapadhyay, Sayan
    Fomin, Fedor V.
    Golovach, Petr A.
    Simonov, Kirill
    [J]. ACM TRANSACTIONS ON COMPUTATION THEORY, 2023, 15 (3-4)
  • [7] Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics
    Sisiaridis, Dimitrios
    Markowitch, Olivier
    [J]. 2018 1ST INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2018), 2018, : 43 - 48
  • [8] Feature selection for domain adaptation using complexity measures and swarm intelligence
    Castillo-Garcia, G.
    Moran-Fernandez, L.
    Bolon-Canedo, V.
    [J]. NEUROCOMPUTING, 2023, 548
  • [9] On the Suitability of Combining Feature Selection and Resampling to Manage Data Complexity
    Martin-Felez, Raul
    Mollineda, Ramon A.
    [J]. CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2010, 5988 : 141 - +
  • [10] Dynamic selection of normalization techniques using data complexity measures
    Jain, Sukirty
    Shukla, Sanyam
    Wadhvani, Rajesh
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 106 : 252 - 262