Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation

被引:4
|
作者
Zhao, Shaofei [1 ]
Fu, Guifang [1 ]
机构
[1] Binghamton Univ, Dept Math Sci, Vestal, NY 13850 USA
关键词
Distance correlation; Feature screening; Multivariate rank; Sure screening property; Ultrahigh dimensional data analysis; VARYING COEFFICIENT MODELS; VARIABLE SELECTION;
D O I
10.1016/j.jmva.2022.105081
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, many existing feature screening approaches are either restricted to a univariate response or rely on some distribution or model assumptions. In this article, we propose a sure independence screening approach based on the multivariate rank distance correlation (MrDc-SIS). The MrDc-SIS achieves multiple desirable properties such as being distribution-free, completely nonparametric, scale-free and robust for outliers or heavy tails. Moreover, the MrDc-SIS can be used to screen either univariate or multivariate responses and either one dimensional or multi-dimensional predictors. We establish the theoretical sure screening and rank consistency properties of the MrDc-SIS approach under a mild condition by lifting previous assumptions about the finite moments. Simulation studies demonstrate that MrDc-SIS outperforms eight other closely relevant approaches under certain settings. We also apply the MrDc-SIS approach to a multi-omics ovarian carcinoma data downloaded from The Cancer Genome Atlas (TCGA).(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:19
相关论文
共 50 条