Online updating Huber robust regression for big data streams

被引:1
|
作者
Tao, Chunbai [1 ,2 ]
Wang, Shanshan [1 ,3 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing, Peoples R China
[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[3] Beihang Univ, MOE, Key Lab Complex Syst Anal & Management Decis, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Online updating; Huber regression; big data streams; divide-and-conquer; QUANTILE REGRESSION;
D O I
10.1080/02331888.2024.2398057
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Big data streams have garnered significant attention in multiple industries. However, the immense volume and the presence of outliers in high-velocity streaming data pose great challenges to its analysis. To address these concerns, this paper introduces a novel Online Updating Huber Robust Regression algorithm. By efficiently capturing the salient features of new data subsets, a computationally efficient online updating estimator is proposed without the need for storing historical data. Furthermore, by incorporating Huber regression into its framework, the estimator exhibits robustness to heavy-tailed, heterogeneous as well as outlier-contaminated data. Theoretically, the proposed online updating estimator is asymptotically equivalent to an Oracle estimator derived from the entire dataset. Extensive numerical simulations and a real-world data analysis have been conducted to demonstrate the effectiveness and practicality of the proposed method.
引用
收藏
页码:1197 / 1223
页数:27
相关论文
共 50 条
  • [1] Online updating method with new variables for big data streams
    Wang, Chun
    Chen, Ming-Hui
    Wu, Jing
    Yan, Jun
    Zhang, Yuping
    Schifano, Elizabeth
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2018, 46 (01): : 123 - 146
  • [2] An Online Robust Support Vector Regression for Data Streams
    Yu, Hang
    Lu, Jie
    Zhang, Guangquan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (01) : 150 - 163
  • [3] Online updating method to correct for measurement error in big data streams
    Lee, JooChul
    Wang, HaiYing
    Schifano, Elizabeth D.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 149
  • [4] A VARIANT OF HUBER ROBUST REGRESSION
    BONCELET, CG
    DICKINSON, BW
    SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1984, 5 (03): : 720 - 734
  • [5] Online Updating Algorithms of Statistical Methods for Big Data
    Li, Yihao
    Wang, Jin
    2019 2ND INTERNATIONAL CONFERENCE ON COMPUTING AND BIG DATA (ICCBD 2019), 2019, : 81 - 85
  • [6] Online Updating of Statistical Inference in the Big Data Setting
    Schifano, Elizabeth D.
    Wu, Jing
    Wang, Chun
    Yan, Jun
    Chen, Ming-Hui
    TECHNOMETRICS, 2016, 58 (03) : 393 - 403
  • [7] Online Anomaly Detection over Big Data Streams
    Rettig, Laura
    Khayati, Mourad
    Cudre-Mauroux, Philippe
    Piorkowski, Michal
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1113 - 1122
  • [8] Applications of Robust Regression to "Big" Data Problems
    Sheather, Simon J.
    ROBUST RANK-BASED AND NONPARAMETRIC METHODS, 2016, 168 : 101 - 120
  • [9] Online Meta-Forest for Regression Data Streams
    Shaker, Ammar
    Gartner, Christoph
    He, Xiao
    Yu, Shujian
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [10] Incremental Huber-Support vector regression based online robust parameter design
    Zhou, Xiaojian
    Xiao, Dan
    Yu, Jieyao
    Jiang, Ting
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (08) : 2924 - 2944