Robust variable selection and distributed inference using τ-based estimators for large-scale data

被引:0
|
作者
Mozafari-Majd, Emadaldin [1 ]
Koivunen, Visa [1 ]
机构
[1] Aalto Univ, Espoo, Finland
基金
芬兰科学院;
关键词
statistical inference; robust; sparse; high-dimensional; large-scale data; variable selection; bootstrap;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we address the problem of performing robust statistical inference for large-scale data sets whose volume and dimensionality maybe so high that distributed storage and processing is required. Here, the large-scale data are assumed to be contaminated by outliers and exhibit sparseness. We propose a distributed and robust two-stage statistical inference method. In the first stage, robust variable selection is done by exploiting tau-Lasso to find the sparse basis in each node with distinct subset of data. The selected variables are communicated to a fusion center (FC) in which the variables for the complete data are chosen using a majority voting rule. In the second stage, confidence intervals and parameter estimates are found in each node using robust tau-estimator combined with bootstrapping and then combined in FC. The simulation results demonstrate the validity and reliability of the algorithm in variable selection and constructing confidence intervals even if the estimation problem in the subsets is slightly underdetermined.
引用
收藏
页码:2453 / 2457
页数:5
相关论文
共 50 条
  • [1] Two-Stage Robust and Sparse Distributed Statistical Inference for Large-Scale Data
    Mozafari-Majd, Emadaldin
    Koivunen, Visa
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 5351 - 5365
  • [2] Using Data Accessibility for Resource Selection in Large-Scale Distributed Systems
    Kim, Jinoh
    Chandra, Abhishek
    Weissman, Jon B.
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (06) : 788 - 801
  • [3] Robust Scheduling for Large-Scale Distributed Systems
    Lee, Young Choon
    King, Jayden
    Kim, Young Ki
    Hong, Seok-Hee
    [J]. 2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 38 - 45
  • [4] Distributed Bayesian Inference for Large-Scale IoT Systems
    Vlachou, Eleni
    Karras, Aristeidis
    Karras, Christos
    Theodorakopoulos, Leonidas
    Halkiopoulos, Constantinos
    Sioutas, Spyros
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (01)
  • [5] Large-scale inference of human genetic data
    Rivas, M. A.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 1064 - 1064
  • [6] An Incremental and Distributed Inference Method for Large-Scale Ontologies Based on MapReduce Paradigm
    Liu, Bo
    Huang, Keman
    Li, Jianqiang
    Zhou, MengChu
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (01) : 53 - 64
  • [7] Antenna selection based on large-scale fading for distributed MIMO systems
    施荣华
    Yuan Zexi
    Dong Jian
    Lei Wentai
    Peng Chunhua
    [J]. High Technology Letters, 2016, 22 (03) : 233 - 240
  • [8] Outlier Detection in Large-Scale Sensor Network Data Using Shrinkage Estimators
    Wu, Ming-Chun
    Chen, Kwang-Cheng
    [J]. 2015 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2015,
  • [9] Distributed Privacy-Aware Fast Selection Algorithm for Large-Scale Data
    Liu, Hao
    Chen, Jiming
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (02) : 365 - 376
  • [10] Scalable Algorithms for Bayesian Inference of Large-Scale Models from Large-Scale Data
    Ghattas, Omar
    Isaac, Tobin
    Petra, Noemi
    Stadler, Georg
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 3 - 6