Online updating Huber robust regression for big data streams

被引:1
|
作者
Tao, Chunbai [1 ,2 ]
Wang, Shanshan [1 ,3 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing, Peoples R China
[2] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[3] Beihang Univ, MOE, Key Lab Complex Syst Anal & Management Decis, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Online updating; Huber regression; big data streams; divide-and-conquer; QUANTILE REGRESSION;
D O I
10.1080/02331888.2024.2398057
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Big data streams have garnered significant attention in multiple industries. However, the immense volume and the presence of outliers in high-velocity streaming data pose great challenges to its analysis. To address these concerns, this paper introduces a novel Online Updating Huber Robust Regression algorithm. By efficiently capturing the salient features of new data subsets, a computationally efficient online updating estimator is proposed without the need for storing historical data. Furthermore, by incorporating Huber regression into its framework, the estimator exhibits robustness to heavy-tailed, heterogeneous as well as outlier-contaminated data. Theoretically, the proposed online updating estimator is asymptotically equivalent to an Oracle estimator derived from the entire dataset. Extensive numerical simulations and a real-world data analysis have been conducted to demonstrate the effectiveness and practicality of the proposed method.
引用
收藏
页码:1197 / 1223
页数:27
相关论文
共 50 条
  • [21] A spatial-adaptive sampling procedure for online monitoring of big data streams
    Wang, Andi
    Xian, Xiaochen
    Tsung, Fugee
    Liu, Kaibo
    JOURNAL OF QUALITY TECHNOLOGY, 2018, 50 (04) : 329 - 343
  • [22] H-tree: Hierarchy index for online monitoring of big data streams
    Research Center of Information Security, Institute of Computing Technology, Chinese Academy of Sciences, Beijing
    100190, China
    不详
    100093, China
    不详
    100876, China
    Jisuanji Xuebao, 1 (35-44):
  • [23] Robust Support Vector Regression in Primal with Asymmetric Huber Loss
    S. Balasundaram
    Yogendra Meena
    Neural Processing Letters, 2019, 49 : 1399 - 1431
  • [24] Nonasymptotic analysis of robust regression with modified Huber?s loss
    Tong, Hongzhi
    JOURNAL OF COMPLEXITY, 2023, 76
  • [25] Robust Support Vector Regression in Primal with Asymmetric Huber Loss
    Balasundaram, S.
    Meena, Yogendra
    NEURAL PROCESSING LETTERS, 2019, 49 (03) : 1399 - 1431
  • [26] Online monitoring of big data streams: A rank-based sampling algorithm by data augmentation
    Xian, Xiaochen
    Zhang, Chen
    Bonk, Scott
    Liu, Kaibo
    JOURNAL OF QUALITY TECHNOLOGY, 2021, 53 (02) : 135 - 153
  • [27] A Statistical Technique for Online Anomaly Detection for Big Data Streams in Cloud Collaborative Environment
    Smrithy, G. S.
    Balakrishnan, Ramadoss
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2016, : 108 - 111
  • [28] Adaptive Huber regression on Markov-dependent data
    Fan, Jianqing
    Guo, Yongyi
    Jiang, Bai
    Stochastic Processes and their Applications, 2022, 150 : 802 - 818
  • [29] Adaptive Huber regression on Markov-dependent data
    Fan, Jianqing
    Guo, Yongyi
    Jiang, Bai
    STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2022, 150 : 802 - 818
  • [30] Online tree-based ensembles and option trees for regression on evolving data streams
    Ikonomovska, Elena
    Gama, Joao
    Dzeroski, Saso
    NEUROCOMPUTING, 2015, 150 : 458 - 470