Using Data Complexity Measures for Thresholding in Feature Selection Rankers

被引:13
|
作者
Seijo-Pardo, Borja [1 ]
Bolon-Canedo, Veronica [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, Campus Elvina S-N, La Coruna 15071, Spain
关键词
REDUNDANCY; RELEVANCE;
D O I
10.1007/978-3-319-44636-3_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last few years, feature selection has become essential to confront the dimensionality problem, removing irrelevant and redundant information. For this purpose, ranker methods have become an approximation commonly used since they do not compromise the computational efficiency. Ranker methods return an ordered ranking of all the features, and thus it is necessary to establish a threshold to reduce the number of features to deal with. In this work, a practical subset of features is selected according to three different data complexity measures, releasing the user from the task of choosing a fixed threshold in advance. The proposed approach was tested on six different DNA microarray datasets which have brought a difficult challenge for researchers due to the high number of gene expression and the low number of patients. The adequacy of the proposed approach in terms of classification error was checked by the use of an ensemble of ranker methods with a Support Vector Machine as classifier. This study shows that our approach was able to achieve competitive results compared with those obtained by fixed threshold approach, which is the standard in most research works.
引用
收藏
页码:121 / 131
页数:11
相关论文
共 50 条
  • [31] Feature selection using fuzzy entropy measures with similarity classifier
    Luukka, Pasi
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) : 4600 - 4607
  • [32] Feature selection using data envelopment analysis
    Zhang, Yishi
    Yang, Anrong
    Xiong, Chan
    Wang, Teng
    Zhang, Zigang
    KNOWLEDGE-BASED SYSTEMS, 2014, 64 : 70 - 80
  • [33] Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification
    Sun, Lin
    Zhang, Xiaoyu
    Qian, Yuhua
    Xu, Jiucheng
    Zhang, Shiguang
    INFORMATION SCIENCES, 2019, 502 : 18 - 41
  • [34] Image thresholding using measures of fuzziness
    Yumusak, N
    Temurtas, F
    Cerezci, O
    Pazar, S
    IECON '98 - PROCEEDINGS OF THE 24TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-4, 1998, : 1300 - 1305
  • [35] Feature Weighting and Selection Using a Hybrid Approach Based on Rademacher Complexity Model Selection
    Giraldo, L. F.
    Delgado, E.
    Castellanos, C. G.
    COMPUTERS IN CARDIOLOGY 2007, VOL 34, 2007, 34 : 257 - 260
  • [36] The Complexity of Feature Selection for Consistent Biclustering
    Kundakcioglu, O. Erhun
    Pardalos, Panos M.
    CLUSTER CHALLENGES IN BIOLOGICAL NETWORKS, 2009, : 257 - 266
  • [37] Feature Selection Under a Complexity Constraint
    Plasberg, Jan H.
    Kleijn, W. Bastiaan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2009, 11 (03) : 565 - 571
  • [38] A Filter Based Feature Selection Approach Using Lempel Ziv Complexity
    Ahmed, Sultan Uddin
    Khan, Md. Fazle Elahi
    Shahjahan, Md.
    ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II, 2011, 6676 : 260 - +
  • [39] Sparse Group Feature Selection by Weighted Thresholding Homotopy Method
    Wu, Jinglan
    Huang, Huating
    Zhu, Wenxing
    IEEE ACCESS, 2020, 8 (08): : 20700 - 20707
  • [40] Feature selection for pattern recognition by LASSO and thresholding methods - a comparison
    Libal, Urszula
    2011 16TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, 2011, : 168 - 173