Feature Selection for High-Dimensional Data Through Instance Vote Combining

被引：0

作者：

Chamakura, Lily ^{[1
]}

Saha, Goutam ^{[1
]}

机构：

[1] Indian Inst Technol Kharagpur, Kharagpur, W Bengal, India

来源：

PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020) | 2020年

关键词：

Feature selection; Filter-based method; Set-covering problem; Instance voting; Graph modularity; Vote combining; CLASSIFICATION; PREDICTION; DISCOVERY; CANCER;

D O I：

10.1145/3371158.3371177

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Supervised feature selection (FS) is used to select a discriminative and non-redundant subset of features in classification problems dealing with high dimensional inputs. In this paper, feature selection is posed akin to the set-covering problem where the goal is to select a subset of features such that they cover the instances. To solve this formulation, we quantify the local relevance (i.e., votes assigned by instances) of each feature that captures the extent to which a given feature is useful to classify the individual instances correctly. In this work, we propose to combine the instance votes across features to infer their joint local relevance. The votes are combined on the basis of geometric principles underlying classification and feature spaces. Further, we show how such instance vote combining may be employed to derive a heuristic search strategy for selecting a relevant and non-redundant subset of features. We illustrate the effectiveness of our approach by evaluating the classification performance and robustness to data variations on publicly available benchmark datasets. We observed that the proposed method outperforms state-of-the-art mutual information based FS techniques and performs comparably to other heuristic approaches that solve the set-covering formulation of feature selection.

引用

页码：161 / 169

页数：9

共 50 条

[1] Feature selection for high-dimensional data
Bolón-Canedo V.
Sánchez-Maroño N.
Alonso-Betanzos A.
[J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
[2] Feature selection for high-dimensional data
Destrero A.
Mosci S.
De Mol C.
Verri A.
Odone F.
[J]. Computational Management Science, 2009, 6 (1) : 25 - 40
[3] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
Verleysen, Michel
[J]. NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
[4] Feature selection for high-dimensional imbalanced data
Yin, Liuzhi
Ge, Yong
Xiao, Keli
Wang, Xuehua
Quan, Xiaojun
[J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
[5] Feature selection for high-dimensional data in astronomy
Zheng, Hongwen
Zhang, Yanxia
[J]. ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
[6] A filter feature selection for high-dimensional data
Janane, Fatima Zahra
Ouaderhman, Tayeb
Chamlal, Hasna
[J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
[7] Feature Selection with High-Dimensional Imbalanced Data
Van Hulse, Jason
Khoshgoftaar, Taghi M.
Napolitano, Amri
Wald, Randall
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
[8] Feature selection for high-dimensional temporal data
Michail Tsagris
Vincenzo Lagani
Ioannis Tsamardinos
[J]. BMC Bioinformatics, 19
[9] Feature selection for high-dimensional temporal data
Tsagris, Michail
Lagani, Vincenzo
Tsamardinos, Ioannis
[J]. BMC BIOINFORMATICS, 2018, 19
[10] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
Verleysen, Michel
[J]. ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,

← 1 2 3 4 5 →