Unified distributed robust regression and variable selection framework for massive data

被引:3
|
作者
Wang, Kangning [1 ]
机构
[1] Shandong Technol & Business Univ, Sch Stat, Yantai 264005, Peoples R China
关键词
Distributed massive data; Robust regression; Communication efficiency; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; COMPRESSION; SHRINKAGE; ALGORITHM;
D O I
10.1016/j.eswa.2021.115701
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a unified distributed robust regression framework for distributed massive data, which can include many robust regressions in one setting. Specifically, we first transfer different types of robust regressions into an asymptotically equivalent least-squares problem. Then the resulting estimator can be calculated as a weighted average of robust local estimators, and the communication cost is reduced, since it involves only one round of communication. In addition, since the local data information is incorporated sufficiently, it is adaptive to the heterogeneity. The new estimator is proven to be equivalent with the corresponding global robust regression estimator. Furthermore, we conduct variable selection based on the unified robust regression framework and adaptive LASSO, and the path of solution can also be conveniently obtained by LARS algorithm. It is theoretically shown that the new variable selection method can select true relevant variables consistently by using a new distributed BIC-type tuning parameter selector. The simulation results confirm the effectiveness of the new methods and the correctness of the theoretical results.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Robust communication-efficient distributed composite quantile regression and variable selection for massive data
    Wang, Kangning
    Li, Shaomin
    Zhang, Benle
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 161
  • [2] Robust distributed estimation and variable selection for massive datasets via rank regression
    Jiaming Luan
    Hongwei Wang
    Kangning Wang
    Benle Zhang
    Annals of the Institute of Statistical Mathematics, 2022, 74 : 435 - 450
  • [3] Robust distributed estimation and variable selection for massive datasets via rank regression
    Luan, Jiaming
    Wang, Hongwei
    Wang, Kangning
    Zhang, Benle
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2022, 74 (03) : 435 - 450
  • [4] Robust distributed modal regression for massive data
    Wang, Kangning
    Li, Shaomin
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 160
  • [5] Variable selection in robust regression models for longitudinal data
    Fan, Yali
    Qin, Guoyou
    Zhu, Zhongyi
    JOURNAL OF MULTIVARIATE ANALYSIS, 2012, 109 : 156 - 167
  • [6] Optimal subsample selection for massive logistic regression with distributed data
    Zuo, Lulu
    Zhang, Haixiang
    Wang, HaiYing
    Sun, Liuquan
    COMPUTATIONAL STATISTICS, 2021, 36 (04) : 2535 - 2562
  • [7] Optimal subsample selection for massive logistic regression with distributed data
    Lulu Zuo
    Haixiang Zhang
    HaiYing Wang
    Liuquan Sun
    Computational Statistics, 2021, 36 : 2535 - 2562
  • [8] Robust and smoothing variable selection for quantile regression models with longitudinal data
    Fu, Z. C.
    Fu, L. Y.
    Song, Y. N.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (15) : 2600 - 2624
  • [9] ROBUST VIF REGRESSION WITH APPLICATION TO VARIABLE SELECTION IN LARGE DATA SETS
    Dupuis, Debbie J.
    Victoria-Feser, Maria-Pia
    ANNALS OF APPLIED STATISTICS, 2013, 7 (01): : 319 - 341
  • [10] A unified framework of analyzing missing data and variable selection using regularized likelihood
    Bian, Yuan
    Yi, Grace Y.
    He, Wenqing
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 194