Residual projection for quantile regression in vertically partitioned big data

被引:1
|
作者
Fan, Ye [1 ]
Li, Jr-Shin [2 ]
Lin, Nan [3 ]
机构
[1] Capital Univ Econ & Business, Sch Stat, Beijing 100070, Peoples R China
[2] Washington Univ St Louis, Dept Elect & Syst Engn, St Louis, MO 63130 USA
[3] Washington Univ St Louis, Dept Math & Stat, St Louis, MO 63130 USA
关键词
ADMM; Parallel framework; Privacy preservation; Quantile regression; Residual projection; Vertically distributed big data; COORDINATE DESCENT; ALGORITHMS; SELECTION;
D O I
10.1007/s10618-022-00914-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Standard regression techniques model only the mean of the response variable. Quantile regression (QR) is more powerful in that it depicts a comprehensive relationship between the response variable and independent covariates at different quantiles. It is particularly useful for non-normally distributed data with skewness or heterogeneity, which appear routinely in many scientific fields, such as economics, finance, public health and biology. Although its theory has been well developed in the literature, its computation in big data still faces multiple challenges, especially for vertically stored big data in modern distributed environments, where communication efficiency and security are usually the primary considerations. While the popular alternating direction method of multipliers (ADMM) provides a general computational solution, its slow convergence becomes a bottleneck when communication cost dominates local computational consumption, such as Internet of Things (IoT) networks. Motivated by the residual projection technique, in this paper we propose an innovative iterative parallel framework, PIQR, that converges faster and has a more secure data transmission plan, and establish its convergence property. This framework is further extended to composite quantile regression (CQR), a modified QR technique that improves estimation efficiency at extreme quantiles. Simulation studies show that both the ADMM-based method and the PIQR enjoy favorable estimation accuracy in distributed environments. While PIQR is inferior to the ADMM-based method at local computation, it requires much fewer iterations to achieve convergence, and hence significantly improves the overall computational efficiency when communication cost is the dominating factor. Moreover, PIQR transmits only data involving the residual information between different machines, and can better prevent the leakage of important data information compared with the ADMM-based method.
引用
收藏
页码:710 / 735
页数:26
相关论文
共 50 条