Using Data Complexity Measures for Thresholding in Feature Selection Rankers

被引:13
|
作者
Seijo-Pardo, Borja [1 ]
Bolon-Canedo, Veronica [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, Campus Elvina S-N, La Coruna 15071, Spain
关键词
REDUNDANCY; RELEVANCE;
D O I
10.1007/978-3-319-44636-3_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last few years, feature selection has become essential to confront the dimensionality problem, removing irrelevant and redundant information. For this purpose, ranker methods have become an approximation commonly used since they do not compromise the computational efficiency. Ranker methods return an ordered ranking of all the features, and thus it is necessary to establish a threshold to reduce the number of features to deal with. In this work, a practical subset of features is selected according to three different data complexity measures, releasing the user from the task of choosing a fixed threshold in advance. The proposed approach was tested on six different DNA microarray datasets which have brought a difficult challenge for researchers due to the high number of gene expression and the low number of patients. The adequacy of the proposed approach in terms of classification error was checked by the use of an ensemble of ranker methods with a Support Vector Machine as classifier. This study shows that our approach was able to achieve competitive results compared with those obtained by fixed threshold approach, which is the standard in most research works.
引用
收藏
页码:121 / 131
页数:11
相关论文
共 50 条
  • [21] Assesing the Stability and Selection Performance of Feature Selection Methods Under Different Data Complexity
    Al Hosni, Omaimah
    Starkey, Andrew
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2022, 19 (3A) : 442 - 455
  • [22] Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification
    Xu, Jiucheng
    Qu, Kanglin
    Qu, Kangjian
    Hou, Qincheng
    Meng, Xiangru
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (12) : 4011 - 4028
  • [23] Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification
    Jiucheng Xu
    Kanglin Qu
    Kangjian Qu
    Qincheng Hou
    Xiangru Meng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 4011 - 4028
  • [24] Stability and Accuracy of Feature Selection Methods on Datasets of Varying Data Complexity
    Al Hosni, Omaimah
    Starkey, Andrew
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 199 - 209
  • [25] Feature selection based on sparse representation with the measures of classification error rate and complexity of boundary
    Deng, Yanli
    Jin, Weidong
    OPTIK, 2015, 126 (20): : 2634 - 2639
  • [26] Consistency measures for feature selection
    Antonio Arauzo-Azofra
    Jose Manuel Benitez
    Juan Luis Castro
    Journal of Intelligent Information Systems, 2008, 30 : 273 - 292
  • [27] Consistency measures for feature selection
    Arauzo-Azofra, Antonio
    Manuel Benitez, Jose
    Luis Castro, Juan
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2008, 30 (03) : 273 - 292
  • [28] Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery
    Annette Spooner
    Gelareh Mohammadi
    Perminder S. Sachdev
    Henry Brodaty
    Arcot Sowmya
    BMC Bioinformatics, 24
  • [29] Ensemble feature selection with data-driven thresholding for Alzheimer's disease biomarker discovery
    Spooner, Annette
    Mohammadi, Gelareh
    Sachdev, Perminder S.
    Brodaty, Henry
    Sowmya, Arcot
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [30] Feature Selection for Partially Labeled Data Based on Neighborhood Granulation Measures
    Li, Bingyang
    Xiao, Jianmei
    Wang, Xihuai
    IEEE ACCESS, 2019, 7 : 37238 - 37250