Wasserstein filter for variable screening in binary classification in the reproducing kernel Hilbert space

被引:0
|
作者
Jeong, Sanghun [1 ]
Kim, Choongrak [1 ]
Yang, Hojin [1 ]
机构
[1] Pusan Natl Univ, Dept Stat, Busan, South Korea
基金
新加坡国家研究基金会;
关键词
Distributional difference; feature map; l(2) distance; reproducing kernel; sure screening property; GENE; DEPENDENCE; SELECTION;
D O I
10.1080/10485252.2023.2235430
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The aim of this paper is to develop a marginal screening method for variable screening in high-dimensional binary classification based on the Wasserstein distance accounting for the distributional difference. Many existing screening methods, such as the two-sample t-test and Kolmogorov test, have been developed under the parametric/nonparametric modeling assumptions to reduce the dimension of the predictors. However, such modeling specifications or nonparametric approaches are associated with the probability measure induced by the predictor in a Euclidean space. While many machine learning methods have successfully found the nonlinear decision boundary in the transformed space, called the reproducing kernel Hilbert space (RKHS), we consider the Wasserstein filter's capacity to detect the distributional difference between two probability measures induced by the nonlinear function of the predictor in the RKHS. Thereby, we can flexibly filter out the non-informative predictors associated with the binary classification, as well as escape the modeling assumptions required in a Euclidean space. We prove that the Wasserstein filter satisfies the sure screening property under some mild conditions. We also demonstrate the advantages of our proposed approach by comparing the finite sample performance of it with those of the existing choices through simulation studies, as well as through application to lung cancer data.
引用
收藏
页码:623 / 642
页数:20
相关论文
共 50 条