Unified distributed robust regression and variable selection framework for massive data

被引:3
|
作者
Wang, Kangning [1 ]
机构
[1] Shandong Technol & Business Univ, Sch Stat, Yantai 264005, Peoples R China
关键词
Distributed massive data; Robust regression; Communication efficiency; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; COMPRESSION; SHRINKAGE; ALGORITHM;
D O I
10.1016/j.eswa.2021.115701
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a unified distributed robust regression framework for distributed massive data, which can include many robust regressions in one setting. Specifically, we first transfer different types of robust regressions into an asymptotically equivalent least-squares problem. Then the resulting estimator can be calculated as a weighted average of robust local estimators, and the communication cost is reduced, since it involves only one round of communication. In addition, since the local data information is incorporated sufficiently, it is adaptive to the heterogeneity. The new estimator is proven to be equivalent with the corresponding global robust regression estimator. Furthermore, we conduct variable selection based on the unified robust regression framework and adaptive LASSO, and the path of solution can also be conveniently obtained by LARS algorithm. It is theoretically shown that the new variable selection method can select true relevant variables consistently by using a new distributed BIC-type tuning parameter selector. The simulation results confirm the effectiveness of the new methods and the correctness of the theoretical results.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Robust variable selection for mixture linear regression models
    Jiang, Yunlu
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2016, 45 (02): : 549 - 559
  • [32] Robust Variable Selection and Estimation in Threshold Regression Model
    Li, Bo-wen
    Zhang, Yun-qi
    Tang, Nian-sheng
    ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2020, 36 (02): : 332 - 346
  • [33] Robust estimation and variable selection in heteroscedastic linear regression
    Gijbels, I.
    Vrinssen, I.
    STATISTICS, 2019, 53 (03) : 489 - 532
  • [34] Robust variable selection for finite mixture regression models
    Tang, Qingguo
    Karunamuni, R. J.
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2018, 70 (03) : 489 - 521
  • [35] Optimal/robust distributed data fusion: a unified approach
    Mahler, R
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION IX, 2000, 4052 : 128 - 138
  • [36] Adaptive distributed support vector regression of massive data
    Liang, Shu-na
    Sun, Fei
    Zhang, Qi
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (09) : 3365 - 3382
  • [37] Distributed optimal subsampling for quantile regression with massive data
    Chao, Yue
    Ma, Xuejun
    Zhu, Boya
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2024, 233
  • [38] Distributed optimization for penalized regression in massive compositional data
    Chao, Yue
    Huang, Lei
    Ma, Xuejun
    APPLIED MATHEMATICAL MODELLING, 2025, 141
  • [39] VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA
    Garcia, Ramon I.
    Ibrahim, Joseph G.
    Zhu, Hongtu
    STATISTICA SINICA, 2010, 20 (01) : 149 - 165
  • [40] A unified framework for contrast research of the latent variable multivariate regression methods
    He, Zhangming
    Zhou, Haiyin
    Wang, Jiongqi
    Zhai, Shouchao
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 143 : 136 - 145