Unified distributed robust regression and variable selection framework for massive data

被引:3
|
作者
Wang, Kangning [1 ]
机构
[1] Shandong Technol & Business Univ, Sch Stat, Yantai 264005, Peoples R China
关键词
Distributed massive data; Robust regression; Communication efficiency; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; COMPRESSION; SHRINKAGE; ALGORITHM;
D O I
10.1016/j.eswa.2021.115701
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a unified distributed robust regression framework for distributed massive data, which can include many robust regressions in one setting. Specifically, we first transfer different types of robust regressions into an asymptotically equivalent least-squares problem. Then the resulting estimator can be calculated as a weighted average of robust local estimators, and the communication cost is reduced, since it involves only one round of communication. In addition, since the local data information is incorporated sufficiently, it is adaptive to the heterogeneity. The new estimator is proven to be equivalent with the corresponding global robust regression estimator. Furthermore, we conduct variable selection based on the unified robust regression framework and adaptive LASSO, and the path of solution can also be conveniently obtained by LARS algorithm. It is theoretically shown that the new variable selection method can select true relevant variables consistently by using a new distributed BIC-type tuning parameter selector. The simulation results confirm the effectiveness of the new methods and the correctness of the theoretical results.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] PALLADIO: a parallel framework for robust variable selection in high-dimensional data
    Barbieri, Matteo
    Fiorini, Samuele
    Tomasi, Federico
    Barla, Annalisa
    PROCEEDINGS OF PYHPC2016: 6TH WORKSHOP ON PYTHON FOR HIGH-PERFORMANCE AND SCIENTIFIC COMPUTING, 2016, : 19 - 26
  • [22] Unified methods for variable selection and outlier detection in a linear regression
    Seo, Han Son
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2019, 26 (06) : 575 - 582
  • [23] Robust variable selection in semiparametric mean-covariance regression for longitudinal data analysis
    Guo, Chaohui
    Yang, Hu
    Lv, Jing
    APPLIED MATHEMATICS AND COMPUTATION, 2014, 245 : 343 - 356
  • [24] Robust variable selection for finite mixture regression models
    Qingguo Tang
    R. J. Karunamuni
    Annals of the Institute of Statistical Mathematics, 2018, 70 : 489 - 521
  • [25] A robust and efficient variable selection method for linear regression
    Yang, Zhuoran
    Fu, Liya
    Wang, You-Gan
    Dong, Zhixiong
    Jiang, Yunlu
    JOURNAL OF APPLIED STATISTICS, 2022, 49 (14) : 3677 - 3692
  • [26] Robust Variable Selection and Estimation in Threshold Regression Model
    Bo-wen Li
    Yun-qi Zhang
    Nian-sheng Tang
    Acta Mathematicae Applicatae Sinica, English Series, 2020, 36 : 332 - 346
  • [27] Variable Selection Linear Regression for Robust Speech Recognition
    Tsao, Yu
    Hu, Ting-Yao
    Sakti, Sakriani
    Nakamura, Satoshi
    Lee, Lin-shan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06) : 1477 - 1487
  • [28] Robust nonnegative garrote variable selection in linear regression
    Gijbels, I.
    Vrinssen, I.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 85 : 1 - 22
  • [29] Robust Bayesian nonparametric variable selection for linear regression
    Cabezas, Alberto
    Battiston, Marco
    Nemeth, Christopher
    STAT, 2024, 13 (02):
  • [30] Robust Variable Selection and Estimation in Threshold Regression Model
    Bo-wen LI
    Yun-qi ZHANG
    Nian-sheng TANG
    Acta Mathematicae Applicatae Sinica, 2020, 36 (02) : 332 - 346