Robust variable selection and distributed inference using τ-based estimators for large-scale data

被引:0
|
作者
Mozafari-Majd, Emadaldin [1 ]
Koivunen, Visa [1 ]
机构
[1] Aalto Univ, Espoo, Finland
基金
芬兰科学院;
关键词
statistical inference; robust; sparse; high-dimensional; large-scale data; variable selection; bootstrap;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we address the problem of performing robust statistical inference for large-scale data sets whose volume and dimensionality maybe so high that distributed storage and processing is required. Here, the large-scale data are assumed to be contaminated by outliers and exhibit sparseness. We propose a distributed and robust two-stage statistical inference method. In the first stage, robust variable selection is done by exploiting tau-Lasso to find the sparse basis in each node with distinct subset of data. The selected variables are communicated to a fusion center (FC) in which the variables for the complete data are chosen using a majority voting rule. In the second stage, confidence intervals and parameter estimates are found in each node using robust tau-estimator combined with bootstrapping and then combined in FC. The simulation results demonstrate the validity and reliability of the algorithm in variable selection and constructing confidence intervals even if the estimation problem in the subsets is slightly underdetermined.
引用
收藏
页码:2453 / 2457
页数:5
相关论文
共 50 条
  • [21] Nash-based robust distributed model predictive control for large-scale systems
    Shalmani, Reza Aliakbarpour
    Rahmani, Mehdi
    Bigdeli, Nooshin
    [J]. JOURNAL OF PROCESS CONTROL, 2020, 88 : 43 - 53
  • [22] Data-driven process decomposition and robust online distributed modelling for large-scale processes
    Zhang Shu
    Li Lijuan
    Yao Lijuan
    Yang Shipin
    Zou Tao
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2018, 49 (03) : 449 - 463
  • [23] AN AGENDA FOR RESEARCH IN LARGE-SCALE DISTRIBUTED DATA REPOSITORIES
    SATYANARAYANAN, M
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 563 : 2 - 10
  • [24] Robust large-scale clustering based on correntropy
    Jin, Guodong
    Gao, Jing
    Tan, Lining
    [J]. PLOS ONE, 2022, 17 (11):
  • [25] Distributed Pareto Optimization for Large-Scale Noisy Subset Selection
    Qian, Chao
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (04) : 694 - 707
  • [26] Distributed Data Processing for Large-Scale Simulations on Cloud
    Lu, Tianjian
    Hoyer, Stephan
    Wang, Qing
    Hu, Lily
    Chen, Yi-Fan
    [J]. 2021 JOINT IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY, SIGNAL & POWER INTEGRITY, AND EMC EUROPE (EMC+SIPI AND EMC EUROPE), 2021, : 53 - 58
  • [27] Robust Anomaly Detection for Large-Scale Sensor Data
    Chakrabarti, Aniket
    Marwah, Manish
    Arlitt, Martin
    [J]. BUILDSYS'16: PROCEEDINGS OF THE 3RD ACM CONFERENCE ON SYSTEMS FOR ENERGY-EFFCIENT BUILT ENVIRONMENTS, 2016, : 31 - 40
  • [28] Robust change detection for large-scale data streams
    Zhang, Ruizhi
    Mei, Yajun
    Shi, Jianjun
    [J]. SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS, 2022, 41 (01): : 1 - 19
  • [29] A Standardized Method for Large-scale Distributed Data Acquisition
    Peng, Xin-yi
    Huang, Jing-bin
    Huang, Zhi-wei
    [J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL VII, 2010, : 217 - 222
  • [30] A Standardized Method for Large-scale Distributed Data Acquisition
    Peng, Xin-yi
    Huang, Jing-bin
    Huang, Zhi-wei
    [J]. 2011 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION AND INDUSTRIAL APPLICATION (ICIA2011), VOL II, 2011, : 216 - 221