In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a p-dimensional feature space, this procedure requires fitting only the full model and evaluating p + 1 models, as opposed to the traditional requirement of fitting and evaluating 2(p) models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.
机构:
Boston Univ, Sch Publ Hlth, Dept Epidemiol & Global Hlth, Boston, MA 02215 USABoston Univ, Sch Publ Hlth, Dept Epidemiol & Global Hlth, Boston, MA 02215 USA
Fox, Matthew P.
Arah, Onyebuchi A.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Dept Epidemiol, Fielding Sch Publ Hlth, Los Angeles, CA USA
Univ Calif Los Angeles, Dept Stat, Coll Letters & Sci, Los Angeles, CA USABoston Univ, Sch Publ Hlth, Dept Epidemiol & Global Hlth, Boston, MA 02215 USA
机构:
Univ Chicago, Dept Stat, 5747 South Ellis Ave, Chicago, IL 60637 USA
Univ Chicago, Data Sci Inst, 5747 South Ellis Ave, Chicago, IL 60637 USAUniv Chicago, Dept Stat, 5747 South Ellis Ave, Chicago, IL 60637 USA
Ignatiadis, Nikolaos
Wang, Ruodu
论文数: 0引用数: 0
h-index: 0
机构:
Univ Waterloo, Dept Stat & Actuarial Sci, 200 Univ Ave West, Waterloo, ON N2L 3G1, CanadaUniv Chicago, Dept Stat, 5747 South Ellis Ave, Chicago, IL 60637 USA