Scalable Model-Free Feature Screening via Sliced-Wasserstein Dependency

被引:0
|
作者
Li, Tao [1 ]
Yu, Jun [2 ]
Meng, Cheng [3 ]
机构
[1] Renmin Univ China, Inst Stat & Big Data, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Math & Stat, Beijing, Peoples R China
[3] Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multivariate response model; Nonlinear model; Optimal transport; Sure screening; Variable selection; SIMULTANEOUS DIMENSION REDUCTION; FEATURE-SELECTION; DISTANCE CORRELATION; VARIABLE SELECTION; MICROARRAY DATA; REGRESSION;
D O I
10.1080/10618600.2023.2183213
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the model-free feature screening problem that aims to discard non-informative features before downstream analysis. Most of the existing feature screening approaches have at least quadratic computational cost with respect to the sample size n, thus, may suffer from a huge computational burden when n is large. To alleviate the computational burden, we propose a scalable model-free sure independence screening approach. This approach is based on the so-called sliced-Wasserstein dependency, a novel metric that measures the dependence between two random variables. Specifically, we quantify the dependence between two random variables by measuring the sliced-Wasserstein distance between their joint distribution and the product of their marginal distributions. For a predictor matrix of size n x d, the computational cost for the proposed algorithm is at the order of O(n log (n)d), even when the response variable is multivariate. Theoretically, we show the proposed method enjoys both sure screening and rank consistency properties under mild regularity conditions. Numerical studies on various synthetic and real-world datasets demonstrate the superior performance of the proposed method in comparison with mainstream competitors, requiring significantly less computational time. for this article are available online.
引用
收藏
页码:1501 / 1511
页数:11
相关论文
共 50 条
  • [41] Model-free feature screening for ultra-high dimensional competing risks data
    Chen, Xiaolin
    Zhang, Yahui
    Liu, Yi
    Chen, Xiaojing
    STATISTICS & PROBABILITY LETTERS, 2020, 164
  • [42] Model-free feature screening for ultrahigh-dimensional data conditional on some variables
    Liu, Yi
    Wang, Qihua
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2018, 70 (02) : 283 - 301
  • [43] Scalable and model-free detection of spatial patterns and colocalization
    Liu, Qi
    Hsu, Chih-Yuan
    Shyr, Yu
    GENOME RESEARCH, 2022, 32 (09) : 1736 - 1745
  • [44] Unified model-free interaction screening via CV-entropy filter
    Xiong, Wei
    Chen, Yaxian
    Ma, Shuangge
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 180
  • [45] Adaptive model-free sure independence screening
    Wen, Canhong
    Zhu, Shan
    Chen, Xin
    Wang, Xueqin
    STATISTICS AND ITS INTERFACE, 2017, 10 (03) : 399 - 406
  • [46] Model-free conditional feature screening for ultra-high dimensional right censored data
    Chen, Xiaolin
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (12) : 2425 - 2446
  • [47] Robust Model-Free Gait Recognition by Statistical Dependency Feature Selection and Globality-Locality Preserving Projections
    Rida, Imad
    Boubchir, Larbi
    Al-Maadeed, Noor
    Al-Maadeed, Somaya
    Bouridane, Ahmed
    2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 652 - 655
  • [48] Model-free screening for variables with treatment interaction
    Bizuayehu, Shiferaw B.
    Xu, Jin
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2022, 31 (10) : 1845 - 1859
  • [49] Entropy-based model-free feature screening for ultrahigh-dimensional multiclass classification
    Ni, Lyu
    Fang, Fang
    JOURNAL OF NONPARAMETRIC STATISTICS, 2016, 28 (03) : 515 - 530
  • [50] Model-free, monotone invariant and computationally efficient feature screening with data-adaptive threshold
    Deng, Linsui
    Zhang, Yilin
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2024, 228 : 23 - 33