Robust variable selection and distributed inference using τ-based estimators for large-scale data

被引:0
|
作者
Mozafari-Majd, Emadaldin [1 ]
Koivunen, Visa [1 ]
机构
[1] Aalto Univ, Espoo, Finland
基金
芬兰科学院;
关键词
statistical inference; robust; sparse; high-dimensional; large-scale data; variable selection; bootstrap;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we address the problem of performing robust statistical inference for large-scale data sets whose volume and dimensionality maybe so high that distributed storage and processing is required. Here, the large-scale data are assumed to be contaminated by outliers and exhibit sparseness. We propose a distributed and robust two-stage statistical inference method. In the first stage, robust variable selection is done by exploiting tau-Lasso to find the sparse basis in each node with distinct subset of data. The selected variables are communicated to a fusion center (FC) in which the variables for the complete data are chosen using a majority voting rule. In the second stage, confidence intervals and parameter estimates are found in each node using robust tau-estimator combined with bootstrapping and then combined in FC. The simulation results demonstrate the validity and reliability of the algorithm in variable selection and constructing confidence intervals even if the estimation problem in the subsets is slightly underdetermined.
引用
收藏
页码:2453 / 2457
页数:5
相关论文
共 50 条
  • [31] Robust scale estimators for fuzzy data
    de la Rosa de Saa, Sara
    Asuncion Lubiano, Maria
    Sinova, Beatriz
    Filzmoser, Peter
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (04) : 731 - 758
  • [32] Robust scale estimators for fuzzy data
    Sara de la Rosa de Sáa
    María Asunción Lubiano
    Beatriz Sinova
    Peter Filzmoser
    [J]. Advances in Data Analysis and Classification, 2017, 11 : 731 - 758
  • [33] Robust Optimization as Data Augmentation for Large-scale Graphs
    Kong, Kezhi
    Li, Guohao
    Ding, Mucong
    Wu, Zuxuan
    Zhu, Chen
    Ghanem, Bernard
    Taylor, Gavin
    Goldstein, Tom
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 60 - 69
  • [34] Data-based distributed model predictive control for large-scale systems
    Li, Yan
    Zhang, Hao
    Wang, Zhuping
    Huang, Chao
    Yan, Huaicheng
    [J]. NONLINEAR DYNAMICS, 2024,
  • [35] Large-Scale Data Storage and Management Scheme Based on Distributed Database Systems
    Sun, Qiao
    Deng, Bu-qiao
    Fu, Lan-mei
    Wang, Zhi-qiang
    Pei, Xu-bin
    Sun, Jia-Song
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INTELLIGENT MANUFACTURING (ITIM 2017), 2017, 142 : 14 - 17
  • [36] Distributed Entity Resolution Based on Similarity Join for Large-Scale Data Clustering
    Nie, Tiezheng
    Lee, Wang-chien
    Shen, Derong
    Yu, Ge
    Kou, Yue
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 138 - 149
  • [37] Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data
    Heller, Ruth
    Chatterjee, Nilanjan
    Krieger, Abba
    Shi, Jianxin
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (524) : 1770 - 1783
  • [38] ROBUST, SPARSE AND SCALABLE INFERENCE USING BOOTSTRAP AND VARIABLE SELECTION FUSION
    Mozafari-Majd, Emadaldin
    Koivunen, Visa
    [J]. 2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 271 - 275
  • [39] Robust Object-Mass Measurement Using Condition-Based Less-Error Data Selection for Large-Scale Hydraulic Manipulators
    Kamezaki, Mitsuhiro
    Iwata, Hiroyasu
    Sugano, Shigeki
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS IEEE-ROBIO 2014, 2014, : 1679 - 1684
  • [40] Large-scale attribute selection using wrappers
    Guetlein, Martin
    Frank, Eibe
    Hall, Mark
    Karwath, Andreas
    [J]. 2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, : 332 - 339