Quantile regression in big data: A divide and conquer based strategy

被引:37
|
作者
Chen, Lanjue [1 ,3 ,4 ]
Zhou, Yong [1 ,2 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
[2] East China Normal Univ, Key Lab Adv Theory & Applicat Stat & Data Sci, MOE, Acad Stat & Interdisciplinary Sci, Shanghai 200062, Peoples R China
[3] City Univ Hong Kong, Dept Management Sci, Kowloon, Hong Kong, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Data stream; Divide and conquer; Estimating equation; Massive data sets; Quantile regression; WAGE STRUCTURE; COMPRESSION; MODELS;
D O I
10.1016/j.csda.2019.106892
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Quantile regression, which analyzes the conditional distribution of outcomes given a set of covariates, has been widely used in many fields. However, the volume and velocity of big data make the estimation of quantile regression model extremely difficult due to the intensive computation and the limited storage. Based on divide and conquer strategy, a simple and efficient method is proposed to address this problem. The proposed approach only keeps summary statistics of each data block and then can use them to reconstruct the estimator of the entire data with asymptotically negligible approximation error. This property makes the proposed method particularly appealing when data blocks are retained in multiple servers or come in the form of data stream. Furthermore, the proposed estimator is shown to be consistent and asymptotically as efficient as the estimating equation estimator calculated using the entire data together when certain conditions hold. The merits of the proposed method are illustrated using both simulation studies and real data analysis. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Divide and conquer kernel quantile regression for massive dataset
    Bang, Sungwan
    Kim, Jaeoh
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2020, 33 (05) : 569 - 578
  • [2] ADMM for Penalized Quantile Regression in Big Data
    Yu, Liqun
    Lin, Nan
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2017, 85 (03) : 494 - 518
  • [3] Optimal subsampling for quantile regression in big data
    Wang, Haiying
    Ma, Yanyuan
    [J]. BIOMETRIKA, 2021, 108 (01) : 99 - 112
  • [4] Bayesian Quantile Regression for Big Data Analysis
    Chu, Yuanqi
    Hu, Xueping
    Yu, Keming
    [J]. NEW FRONTIERS IN BAYESIAN STATISTICS, BAYSM 2021, 2022, 405 : 11 - 22
  • [5] Distributed quantile regression for longitudinal big data
    Fan, Ye
    Lin, Nan
    Yu, Liqun
    [J]. COMPUTATIONAL STATISTICS, 2024, 39 (02) : 751 - 779
  • [6] Distributed quantile regression for longitudinal big data
    Ye Fan
    Nan Lin
    Liqun Yu
    [J]. Computational Statistics, 2024, 39 : 751 - 779
  • [7] Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression
    Kang, Jongkyeong
    Han, Seokwon
    Bang, Sungwan
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (02) : 217 - 227
  • [8] Research on automatic dimensioning based on divide and conquer strategy
    Lu, G.-D.
    Huang, C.-L.
    Peng, Q.-S.
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2001, 13 (06): : 521 - 526
  • [9] Divide and conquer local average regression
    Chang, Xiangyu
    Lin, Shao-Bo
    Wang, Yao
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (01): : 1326 - 1350
  • [10] Learning computer programming using "divide and conquer" strategy vs. without "divide and conquer strategy"
    Trejos, O., I
    Munoz, L. E.
    [J]. ENTRE CIENCIA E INGENIERIA, 2020, 14 (28): : 34 - 39