Feature Selection using e-values

被引:0
|
作者
Majumdar, Subhabrata [1 ,2 ]
Chatterjee, Snigdhansu [1 ]
机构
[1] Univ Minnesota Twin Cities, Sch Stat, Minneapolis, MN 55455 USA
[2] Splunk, San Francisco, CA 94107 USA
基金
美国国家科学基金会;
关键词
VARIABLE SELECTION; MODEL; REGRESSION; BOOTSTRAP; DEPTH; DIMENSION; LASSO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a p-dimensional feature space, this procedure requires fitting only the full model and evaluating p + 1 models, as opposed to the traditional requirement of fitting and evaluating 2(p) models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.
引用
收藏
页数:21
相关论文
共 50 条
  • [11] Merging sequential e-values via martingales
    Vovk, Vladimir
    Wang, Ruodu
    ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (01): : 1185 - 1205
  • [12] On E-values for tandem MS scoring schemes
    Segal, Mark R.
    BIOINFORMATICS, 2008, 24 (14) : 1652 - 1653
  • [13] Robust e-values for gapped local alignments
    Metzler, Dirk
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (04) : 882 - 896
  • [14] False discovery rate control with e-values
    Wang, Ruodu
    Ramdas, Aaditya
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (03) : 822 - 852
  • [15] True and false discoveries with independent and sequential e-values
    Vovk, Vladimir
    Wang, Ruodu
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2024, 52 (04):
  • [16] ON THE EXISTENCE OF POWERFUL P-VALUES AND E-VALUES FOR COMPOSITE HYPOTHESES
    Zhang, Zhenyuan
    Ramdas, Aaditya
    Wang, Ruodu
    ANNALS OF STATISTICS, 2024, 52 (05): : 2241 - 2267
  • [17] The rules of logic composition for the Bayesian epistemic e-values
    Borges, Wagner
    Stern, Julio Michael
    LOGIC JOURNAL OF THE IGPL, 2007, 15 (5-6) : 401 - 420
  • [18] Q-VALUES AND E-VALUES FOR SEVERAL CHAIN TRANSFER AGENTS
    KATAGIRI, K
    UNO, K
    OKAMURA, S
    JOURNAL OF POLYMER SCIENCE, 1955, 17 (83): : 142 - 145
  • [19] CALCULATION OF E-VALUES BY MEANS OF ORIGIN PEAK IN PATTERSON FUNCTION
    NIELSEN, K
    ACTA CRYSTALLOGRAPHICA SECTION A, 1975, 31 (NOV1): : 762 - 763
  • [20] DETERMINATION OF Q-VALUES AND E-VALUES BY A LEAST-SQUARES TECHNIQUE
    GREENLEY, RZ
    JOURNAL OF MACROMOLECULAR SCIENCE-CHEMISTRY, 1975, A 9 (04): : 505 - 516