Feature Selection using e-values

被引:0
|
作者
Majumdar, Subhabrata [1 ,2 ]
Chatterjee, Snigdhansu [1 ]
机构
[1] Univ Minnesota Twin Cities, Sch Stat, Minneapolis, MN 55455 USA
[2] Splunk, San Francisco, CA 94107 USA
基金
美国国家科学基金会;
关键词
VARIABLE SELECTION; MODEL; REGRESSION; BOOTSTRAP; DEPTH; DIMENSION; LASSO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a p-dimensional feature space, this procedure requires fitting only the full model and evaluating p + 1 models, as opposed to the traditional requirement of fitting and evaluating 2(p) models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Confidence and Discoveries with E-values
    Vovk, Vladimir
    Wang, Ruodu
    STATISTICAL SCIENCE, 2023, 38 (02) : 329 - 354
  • [2] E-Values for Mendelian Randomization
    Swanson, Sonja A.
    VanderWeele, Tyler J.
    EPIDEMIOLOGY, 2020, 31 (03) : E23 - E24
  • [3] E-VALUES: CALIBRATION, COMBINATION AND APPLICATIONS
    Vovk, Vladimir
    Wang, Ruodu
    ANNALS OF STATISTICS, 2021, 49 (03): : 1736 - 1754
  • [4] E-values, Multiple Testing and Beyond
    Li, Guanxun
    Zhang, Xianyang
    arXiv, 2023,
  • [5] Concerning the accuracy of MAST E-values
    Bailey, TL
    Gribskov, M
    BIOINFORMATICS, 2000, 16 (05) : 488 - 489
  • [6] A note on e-values and multiple testing
    Li, Guanxun
    Zhang, Xianyang
    BIOMETRIKA, 2024, 112 (01)
  • [7] Online multiple testing with e-values
    Xu, Ziyu
    Ramdas, Aaditya
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [8] Commentary: The value of E-values and why they are not enough
    Fox, Matthew P.
    Arah, Onyebuchi A.
    Stuart, Elizabeth A.
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2020, 49 (05) : 1505 - 1506
  • [9] Re. E-values for Mendelian Randomization
    Sjolander, Arvid
    Gabriel, Erin E.
    EPIDEMIOLOGY, 2024, 35 (01) : E2 - E2
  • [10] E-values as unnormalized weights in multiple testing
    Ignatiadis, Nikolaos
    Wang, Ruodu
    Ramdas, Aaditya
    BIOMETRIKA, 2024, 111 (02) : 417 - 439