Feature Selection using e-values

被引:0
|
作者
Majumdar, Subhabrata [1 ,2 ]
Chatterjee, Snigdhansu [1 ]
机构
[1] Univ Minnesota Twin Cities, Sch Stat, Minneapolis, MN 55455 USA
[2] Splunk, San Francisco, CA 94107 USA
基金
美国国家科学基金会;
关键词
VARIABLE SELECTION; MODEL; REGRESSION; BOOTSTRAP; DEPTH; DIMENSION; LASSO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a p-dimensional feature space, this procedure requires fitting only the full model and evaluating p + 1 models, as opposed to the traditional requirement of fitting and evaluating 2(p) models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] E-values for effect heterogeneity and approximations for causal interaction
    Mathur, Maya B.
    Smith, Louisa H.
    Yoshida, Kazuki
    Ding, Peng
    VanderWeele, Tyler J.
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2022, 51 (04) : 1268 - 1275
  • [22] Web Site and R Package for Computing E-values
    Mathur, Maya B.
    Ding, Peng
    Riddell, Corinne A.
    VanderWeele, Tyler J.
    EPIDEMIOLOGY, 2018, 29 (05) : E45 - E47
  • [23] E-values for k-Sample Tests with Exponential Families
    Hao, Yunda
    Grunwald, Peter
    Lardy, Tyron
    Long, Long
    Adams, Reuben
    SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY, 2024, 86 (01): : 596 - 636
  • [24] Testing with p*-values: Between p-values, mid p-values, and e-values
    Wang, Ruodu
    BERNOULLI, 2024, 30 (02) : 1313 - 1346
  • [25] Testing exchangeability in the batch mode with e-values and Markov alternatives
    Vovk, Vladimir
    MACHINE LEARNING, 2025, 114 (04)
  • [26] Log-optimal anytime-valid E-values
    Koolen, Wouter M.
    Grünwald, Peter
    International Journal of Approximate Reasoning, 2022, 141 : 69 - 82
  • [27] Addressing Unmeasured Confounders in Observational Surgical Studies: E-values
    D. C. Chang
    C. E. Cauley
    Journal of Gastrointestinal Surgery, 2023, 27 : 1296 - 1297
  • [28] Are E-values too optimistic or too pessimistic? Both and neither
    Sjolander, Arvid
    Greenland, Sander
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2022, 51 (02) : 355 - 363
  • [29] Addressing Unmeasured Confounders in Observational Surgical Studies: E-values
    Chang, D. C.
    Cauley, C. E.
    JOURNAL OF GASTROINTESTINAL SURGERY, 2023, 27 (06) : 1296 - 1297
  • [30] THE USE OF CORRECTLY ORIENTED MOLECULAR FRAGMENTS FOR A MODIFICATION OF E-VALUES
    MESSERSCHMIDT, A
    RECK, G
    KUTSCHABSKY, L
    ACTA CRYSTALLOGRAPHICA SECTION A, 1982, 38 (NOV): : 868 - 869