Using Data Complexity Measures for Thresholding in Feature Selection Rankers

被引:13
|
作者
Seijo-Pardo, Borja [1 ]
Bolon-Canedo, Veronica [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, Campus Elvina S-N, La Coruna 15071, Spain
关键词
REDUNDANCY; RELEVANCE;
D O I
10.1007/978-3-319-44636-3_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last few years, feature selection has become essential to confront the dimensionality problem, removing irrelevant and redundant information. For this purpose, ranker methods have become an approximation commonly used since they do not compromise the computational efficiency. Ranker methods return an ordered ranking of all the features, and thus it is necessary to establish a threshold to reduce the number of features to deal with. In this work, a practical subset of features is selected according to three different data complexity measures, releasing the user from the task of choosing a fixed threshold in advance. The proposed approach was tested on six different DNA microarray datasets which have brought a difficult challenge for researchers due to the high number of gene expression and the low number of patients. The adequacy of the proposed approach in terms of classification error was checked by the use of an ensemble of ranker methods with a Support Vector Machine as classifier. This study shows that our approach was able to achieve competitive results compared with those obtained by fixed threshold approach, which is the standard in most research works.
引用
收藏
页码:121 / 131
页数:11
相关论文
共 50 条
  • [11] Fuzzy Information Measures Feature Selection Using Descriptive Statistics Data
    Salem, Omar A. M.
    Liu, Haowen
    Liu, Feng
    Chen, Yi-Ping Phoebe
    Chen, Xi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 77 - 90
  • [12] Using data complexity measures and an evolutionary cultural algorithm for gene selection in microarray data
    Sarbazi-Azad, Saeed
    Saniee Abadeh, Mohammad
    Mowlaei, Mohammad Erfan
    Saniee Abadeh, Mohammad (saniee@modares.ac.ir), 1600, Elsevier B.V. (03):
  • [13] Parameterized Complexity of Feature Selection for Categorical Data Clustering
    Bandyapadhyay, Sayan
    Fomin, Fedor V.
    Golovach, Petr A.
    Simonov, Kirill
    ACM TRANSACTIONS ON COMPUTATION THEORY, 2023, 15 (3-4)
  • [14] Reducing Data Complexity in Feature Extraction and Feature Selection for Big Data Security Analytics
    Sisiaridis, Dimitrios
    Markowitch, Olivier
    2018 1ST INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2018), 2018, : 43 - 48
  • [15] Simultaneous Feature and Feature Group Selection through Hard Thresholding
    Xiang, Shuo
    Yang, Tao
    Ye, Jieping
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 532 - 541
  • [16] On the Suitability of Combining Feature Selection and Resampling to Manage Data Complexity
    Martin-Felez, Raul
    Mollineda, Ramon A.
    CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2010, 5988 : 141 - +
  • [17] Supervised Extraction of Diagnosis Codes from EMRs: Role of Feature Selection, Data Selection, and Probabilistic Thresholding
    Rios, Anthony
    Kavuluru, Ramakanth
    2013 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2013), 2013, : 66 - 73
  • [18] Classifier Recommendation Using Data Complexity Measures
    Garcia, Luis P. F.
    Lorena, Ana C.
    de Souto, Marcilio C. P.
    Ho, Tin Kam
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 874 - 879
  • [19] Thresholding for biomarker selection in multivariate data using Higher Criticism
    Wehrens, Ron
    Franceschi, Pietro
    MOLECULAR BIOSYSTEMS, 2012, 8 (09) : 2339 - 2346
  • [20] Selection of the Best Base Classifier in One-Versus-One Using Data Complexity Measures
    Moran-Fernandez, Laura
    Bolon-Canedo, Veronica
    Alonso-Betanzos, Amparo
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2016, 2016, 9868 : 110 - 120