A quantitative benchmark of neural network feature selection methods for detecting nonlinear signals

被引：1

作者：

Passemiers, Antoine ^{[1
]}

Folco, Pietro ^{[2
]}

Raimondi, Daniele ^{[1
,3
]}

Birolo, Giovanni ^{[2
]}

Moreau, Yves ^{[1
]}

Fariselli, Piero ^{[2
]}

机构：

[1] Katholieke Univ Leuven, ESAT STADIUS, Leuven, Belgium

[2] Univ Torino, Dept Med Sci, Turin, Italy

[3] Univ Montpellier, Inst Genet Mol Montpellier, Montpellier, France

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

关键词：

D O I：

10.1038/s41598-024-82583-5

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Classification and regression problems can be challenging when the relevant input features are diluted in noisy datasets, in particular when the sample size is limited. Traditional Feature Selection (FS) methods address this issue by relying on some assumptions such as the linear or additive relationship between features. Recently, a proliferation of Deep Learning (DL) models has emerged to tackle both FS and prediction at the same time, allowing non-linear modeling of the selected features. In this study, we systematically assess the performance of DL-based feature selection methods on synthetic datasets of varying complexity, and benchmark their efficacy in uncovering non-linear relationships between features. We also use the same settings to benchmark the reliability of gradient-based feature attribution techniques for Neural Networks (NNs), such as Saliency Maps (SM). A quantitative evaluation of the reliability of these approaches is currently missing. Our analysis indicates that even simple synthetic datasets can significantly challenge most of the DL-based FS and SM methods, while Random Forests, TreeShap, mRMR and LassoNet are the best performing FS methods. Our conclusion is that when quantifying the relevance of a few non linearly-entangled predictive features diluted in a large number of irrelevant noisy variables, DL-based FS and SM interpretation methods are still far from being reliable.

引用

页数：17

共 50 条

[21] Neural network based approaches for detecting signals with unknown parameters
de la Mata-Moya, David
Jarabo-Amores, Pilar
Rosa-Zurera, Manuel
Vicen-Bueno, Raul
Nieto-Borge, Jose Carlos
2007 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING, CONFERENCE PROCEEDINGS BOOK, 2007, : 675 - 680
[22] Detecting and Refactoring Feature Envy Based on Graph Neural Network
Yu, Dongjin
Xu, Yihang
Weng, Lehui
Chen, Jie
Chen, Xin
Yang, Quanxin
2022 IEEE 33RD INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2022), 2022, : 458 - 469
[23] Benchmark for filter methods for feature selection in high-dimensional classification data
Bommert, Andrea
Sun, Xudong
Bischl, Bernd
Rahnenfuehrer, Joerg
Lang, Michel
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
[24] Boosting feature selection for Neural Network based regression
Bailly, Kevin
Milgram, Maurice
NEURAL NETWORKS, 2009, 22 (5-6) : 748 - 756
[25] The role of feature selection in artifial neural network applications
Kavzoglu, T
Mather, PM
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2002, 23 (15) : 2919 - 2937
[26] A neural network document classifier with linguistic feature selection
Lee, HM
Chen, CM
Hwang, CW
INTELLIGENT PROBLEM SOLVING: METHODOLOGIES AND APPROACHES, PRODEEDINGS, 2000, 1821 : 555 - 560
[27] NEURAL NETWORK WITH SALIENCY BASED FEATURE SELECTION ABILITY
Wang, Yunong
Bian, Huanyu
Yu, Nenghai
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4502 - 4506
[28] Feature Selection, Deep Neural Network and Trend Prediction
方艳
Journal of Shanghai Jiaotong University(Science), 2018, 23 (02) : 297 - 307
[29] Feature Selection, Deep Neural Network and Trend Prediction
Fang Y.
Journal of Shanghai Jiaotong University (Science), 2018, 23 (2) : 297 - 307
[30] Hadoop neural network for parallel and distributed feature selection
Hodge, Victoria J.
O'Keefe, Simon
Austin, Jim
NEURAL NETWORKS, 2016, 78 : 24 - 35

← 1 2 3 4 5 →