Scalable Model-Free Feature Screening via Sliced-Wasserstein Dependency

被引:0
|
作者
Li, Tao [1 ]
Yu, Jun [2 ]
Meng, Cheng [3 ]
机构
[1] Renmin Univ China, Inst Stat & Big Data, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Math & Stat, Beijing, Peoples R China
[3] Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multivariate response model; Nonlinear model; Optimal transport; Sure screening; Variable selection; SIMULTANEOUS DIMENSION REDUCTION; FEATURE-SELECTION; DISTANCE CORRELATION; VARIABLE SELECTION; MICROARRAY DATA; REGRESSION;
D O I
10.1080/10618600.2023.2183213
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the model-free feature screening problem that aims to discard non-informative features before downstream analysis. Most of the existing feature screening approaches have at least quadratic computational cost with respect to the sample size n, thus, may suffer from a huge computational burden when n is large. To alleviate the computational burden, we propose a scalable model-free sure independence screening approach. This approach is based on the so-called sliced-Wasserstein dependency, a novel metric that measures the dependence between two random variables. Specifically, we quantify the dependence between two random variables by measuring the sliced-Wasserstein distance between their joint distribution and the product of their marginal distributions. For a predictor matrix of size n x d, the computational cost for the proposed algorithm is at the order of O(n log (n)d), even when the response variable is multivariate. Theoretically, we show the proposed method enjoys both sure screening and rank consistency properties under mild regularity conditions. Numerical studies on various synthetic and real-world datasets demonstrate the superior performance of the proposed method in comparison with mainstream competitors, requiring significantly less computational time. for this article are available online.
引用
收藏
页码:1501 / 1511
页数:11
相关论文
共 50 条
  • [1] Hyperbolic Sliced-Wasserstein via Geodesic and Horospherical Projections
    Bonet, Clement
    Chapel, Laetitia
    Drumetz, Lucas
    Courty, Nicolas
    TOPOLOGICAL, ALGEBRAIC AND GEOMETRIC LEARNING WORKSHOPS 2023, VOL 221, 2023, 221
  • [2] Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions
    Liutkus, Antoine
    Simsekli, Umut
    Majewski, Szymon
    Durmus, Alain
    Stoter, Fabian-Robert
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [3] Robust model-free feature screening via quantile correlation
    Ma, Xuejun
    Zhang, Jingxiao
    JOURNAL OF MULTIVARIATE ANALYSIS, 2016, 143 : 472 - 480
  • [4] Model-free feature screening via a modified composite quantile correlation
    Xu, Kai
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2017, 188 : 22 - 35
  • [5] Model-free survival conditional feature screening
    Chen, Xiaolin
    Liu, Wei
    Chen, Xiaojing
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (10) : 5690 - 5708
  • [6] ON MARGINAL SLICED INVERSE REGRESSION FOR ULTRAHIGH DIMENSIONAL MODEL-FREE FEATURE SELECTION
    Yu, Zhou
    Dong, Yuexiao
    Shao, Jun
    ANNALS OF STATISTICS, 2016, 44 (06): : 2594 - 2623
  • [7] Model-Free Conditional Feature Screening with FDR Control
    Tong, Zhaoxue
    Cai, Zhanrui
    Yang, Songshan
    Li, Runze
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (544) : 2575 - 2587
  • [8] A simple model-free survival conditional feature screening
    Chen, Xiaolin
    Zhang, Yahui
    Chen, Xiaojing
    Liu, Yi
    STATISTICS & PROBABILITY LETTERS, 2019, 146 : 156 - 160
  • [9] Model-free feature screening for ultrahigh dimensional classification
    Sheng, Ying
    Wang, Qihua
    JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 178
  • [10] Model-free conditional feature screening with exposure variables
    Zhou, Yeqing
    Liu, Jingyuan
    Hao, Zhihui
    Zhui, Liping
    STATISTICS AND ITS INTERFACE, 2019, 12 (02) : 239 - 251