Robust distributed estimation and variable selection for massive datasets via rank regression

被引:0
|
作者
Jiaming Luan
Hongwei Wang
Kangning Wang
Benle Zhang
机构
[1] Shandong Technology and Business University,
关键词
Massive data; Robustness; Communication efficient; Variable selection;
D O I
暂无
中图分类号
学科分类号
摘要
Rank regression is a robust modeling tool; it is challenging to implement it for the distributed massive data owing to memory constraints. In practice, the massive data may be distributed heterogeneously from machine to machine; how to incorporate the heterogeneity is also an interesting issue. This paper proposes a distributed rank regression (DR2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {DR}^{2}$$\end{document}), which can be implemented in the master machine by solving a weighted least-squares and adaptive when the data are heterogeneous. Theoretically, we prove that the resulting estimator is statistically as efficient as the global rank regression estimator. Furthermore, based on the adaptive LASSO and a newly defined distributed BIC-type tuning parameter selector, we propose a distributed regularized rank regression (DR3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {DR}^{3}$$\end{document}), which can make consistent variable selection and can also be easily implemented by using the LARS algorithm on the master machine. Simulation results and real data analysis are included to validate our method.
引用
收藏
页码:435 / 450
页数:15
相关论文
共 50 条
  • [1] Robust distributed estimation and variable selection for massive datasets via rank regression
    Luan, Jiaming
    Wang, Hongwei
    Wang, Kangning
    Zhang, Benle
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2022, 74 (03) : 435 - 450
  • [2] Unified distributed robust regression and variable selection framework for massive data
    Wang, Kangning
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 186
  • [3] VARIABLE SELECTION AND COEFFICIENT ESTIMATION VIA REGULARIZED RANK REGRESSION
    Leng, Chenlei
    [J]. STATISTICA SINICA, 2010, 20 (01) : 167 - 181
  • [4] Robust communication-efficient distributed composite quantile regression and variable selection for massive data
    Wang, Kangning
    Li, Shaomin
    Zhang, Benle
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 161
  • [5] Robust Signed-Rank Variable Selection in Linear Regression
    Abebe, Asheber
    Bindele, Huybrechts F.
    [J]. ROBUST RANK-BASED AND NONPARAMETRIC METHODS, 2016, 168 : 25 - 45
  • [6] Fast robust variable selection using VIF regression in large datasets
    Seo, Han Son
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2018, 31 (04) : 463 - 473
  • [7] Robust Variable Selection and Estimation in Threshold Regression Model
    Bo-wen Li
    Yun-qi Zhang
    Nian-sheng Tang
    [J]. Acta Mathematicae Applicatae Sinica, English Series, 2020, 36 : 332 - 346
  • [8] Robust Variable Selection and Estimation in Threshold Regression Model
    Bo-wen LI
    Yun-qi ZHANG
    Nian-sheng TANG
    [J]. Acta Mathematicae Applicatae Sinica, 2020, 36 (02) : 332 - 346
  • [9] Robust Variable Selection and Estimation in Threshold Regression Model
    Li, Bo-wen
    Zhang, Yun-qi
    Tang, Nian-sheng
    [J]. ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2020, 36 (02): : 332 - 346
  • [10] Robust estimation and variable selection in heteroscedastic linear regression
    Gijbels, I.
    Vrinssen, I.
    [J]. STATISTICS, 2019, 53 (03) : 489 - 532