We consider the model-free feature screening problem that aims to discard non-informative features before downstream analysis. Most of the existing feature screening approaches have at least quadratic computational cost with respect to the sample size n, thus, may suffer from a huge computational burden when n is large. To alleviate the computational burden, we propose a scalable model-free sure independence screening approach. This approach is based on the so-called sliced-Wasserstein dependency, a novel metric that measures the dependence between two random variables. Specifically, we quantify the dependence between two random variables by measuring the sliced-Wasserstein distance between their joint distribution and the product of their marginal distributions. For a predictor matrix of size n x d, the computational cost for the proposed algorithm is at the order of O(n log (n)d), even when the response variable is multivariate. Theoretically, we show the proposed method enjoys both sure screening and rank consistency properties under mild regularity conditions. Numerical studies on various synthetic and real-world datasets demonstrate the superior performance of the proposed method in comparison with mainstream competitors, requiring significantly less computational time. for this article are available online.
机构:
Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
China Univ Petr, Coll Sci, Qingdao 266580, Peoples R ChinaChinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
Liu, Yi
Wang, Qihua
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
Shenzhen Univ, Inst Stat Sci, Shenzhen 518006, Peoples R ChinaChinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
机构:
Sun Yat Sen Univ, Southern China Res Ctr Stat Sci, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R ChinaSun Yat Sen Univ, Southern China Res Ctr Stat Sci, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R China
Wen, Canhong
Zhu, Shan
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Southern China Res Ctr Stat Sci, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R ChinaSun Yat Sen Univ, Southern China Res Ctr Stat Sci, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R China
Zhu, Shan
Chen, Xin
论文数: 0引用数: 0
h-index: 0
机构:
Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore 117546, SG, SingaporeSun Yat Sen Univ, Southern China Res Ctr Stat Sci, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R China
Chen, Xin
Wang, Xueqin
论文数: 0引用数: 0
h-index: 0
机构:
Sun Yat Sen Univ, Southern China Res Ctr Stat Sci, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R ChinaSun Yat Sen Univ, Southern China Res Ctr Stat Sci, Sch Math & Computat Sci, Guangzhou 510275, Guangdong, Peoples R China
机构:
East China Normal Univ, Sch Stat, Shanghai 200062, Peoples R ChinaEast China Normal Univ, Sch Stat, Shanghai 200062, Peoples R China
Bizuayehu, Shiferaw B.
Xu, Jin
论文数: 0引用数: 0
h-index: 0
机构:
East China Normal Univ, Sch Stat, Shanghai 200062, Peoples R China
East China Normal Univ, Key Lab Adv Theory & Applicat Stat & Data Sci MOE, Shanghai, Peoples R ChinaEast China Normal Univ, Sch Stat, Shanghai 200062, Peoples R China