Residual projection for quantile regression in vertically partitioned big data

被引：1

作者：

Fan, Ye ^{[1
]}

Li, Jr-Shin ^{[2
]}

Lin, Nan ^{[3
]}

机构：

[1] Capital Univ Econ & Business, Sch Stat, Beijing 100070, Peoples R China

[2] Washington Univ St Louis, Dept Elect & Syst Engn, St Louis, MO 63130 USA

[3] Washington Univ St Louis, Dept Math & Stat, St Louis, MO 63130 USA

来源：

DATA MINING AND KNOWLEDGE DISCOVERY | 2023年 / 37卷 / 02期

关键词：

ADMM; Parallel framework; Privacy preservation; Quantile regression; Residual projection; Vertically distributed big data; COORDINATE DESCENT; ALGORITHMS; SELECTION;

D O I：

10.1007/s10618-022-00914-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Standard regression techniques model only the mean of the response variable. Quantile regression (QR) is more powerful in that it depicts a comprehensive relationship between the response variable and independent covariates at different quantiles. It is particularly useful for non-normally distributed data with skewness or heterogeneity, which appear routinely in many scientific fields, such as economics, finance, public health and biology. Although its theory has been well developed in the literature, its computation in big data still faces multiple challenges, especially for vertically stored big data in modern distributed environments, where communication efficiency and security are usually the primary considerations. While the popular alternating direction method of multipliers (ADMM) provides a general computational solution, its slow convergence becomes a bottleneck when communication cost dominates local computational consumption, such as Internet of Things (IoT) networks. Motivated by the residual projection technique, in this paper we propose an innovative iterative parallel framework, PIQR, that converges faster and has a more secure data transmission plan, and establish its convergence property. This framework is further extended to composite quantile regression (CQR), a modified QR technique that improves estimation efficiency at extreme quantiles. Simulation studies show that both the ADMM-based method and the PIQR enjoy favorable estimation accuracy in distributed environments. While PIQR is inferior to the ADMM-based method at local computation, it requires much fewer iterations to achieve convergence, and hence significantly improves the overall computational efficiency when communication cost is the dominating factor. Moreover, PIQR transmits only data involving the residual information between different machines, and can better prevent the leakage of important data information compared with the ADMM-based method.

引用

页码：710 / 735

页数：26

共 50 条

[21] VPPLR: Privacy-preserving logistic regression on vertically partitioned data using vectorization sharing
Zhang, Yuhao
Tang, Min
[J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2024, 82
[22] Differentially Private Publication of Vertically Partitioned Data
Tang, Peng
Cheng, Xiang
Su, Sen
Chen, Rui
Shao, Huaxi
[J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2021, 18 (02) : 780 - 795
[23] Distributed prediction from vertically partitioned data
Skillicorn, D. B.
McConnell, S. M.
[J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2008, 68 (01) : 16 - 36
[24] Privacy preserving DBSCAN for vertically partitioned data
Amirbekyan, Artak
Estivill-Castro, V.
[J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2006, 3975 : 141 - 153
[25] A classification paradigm for distributed vertically partitioned data
Basak, J
Kothari, R
[J]. NEURAL COMPUTATION, 2004, 16 (07) : 1525 - 1544
[26] Extended ADMM for general penalized quantile regression with linear constraints in big data
Liu, Yongxin
Zeng, Peng
[J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023,
[27] Penalized Quantile Regression for Distributed Big Data Using the Slack Variable Representation
Fan, Ye
Lin, Nan
Yin, Xianjun
[J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2021, 30 (03) : 557 - 565
[28] Quantile residual life regression based on semi-competing risks data
Hsieh, Jin-Jian
Wang, Jian-Lin
[J]. JOURNAL OF APPLIED STATISTICS, 2018, 45 (10) : 1770 - 1780
[29] Bayesian scale mixtures of normals linear regression and Bayesian quantile regression with big data and variable selection
Chu, Yuanqi
Yin, Zhouping
Yu, Keming
[J]. JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2023, 428
[30] Outlier Detection using Projection Quantile Regression for Mass Spectrometry Data with Low Replication
Soo-Heang Eo
Daewoo Pak
Jeea Choi
HyungJun Cho
[J]. BMC Research Notes, 5 (1)

← 1 2 3 4 5 →