Fast robust variable selection using VIF regression in large datasets

被引:2
|
作者
Seo, Han Son [1 ]
机构
[1] Konkuk Univ, Dept Appl Stat, 120 Neungdong Ro, Seoul 05029, South Korea
关键词
large dataset; linear regression; stagewise regression; variable selection;
D O I
10.5351/KJAS.2018.31.4.463
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.
引用
收藏
页码:463 / 473
页数:11
相关论文
共 50 条
  • [1] ROBUST VIF REGRESSION WITH APPLICATION TO VARIABLE SELECTION IN LARGE DATA SETS
    Dupuis, Debbie J.
    Victoria-Feser, Maria-Pia
    [J]. ANNALS OF APPLIED STATISTICS, 2013, 7 (01): : 319 - 341
  • [2] Fast Robust Model Selection in Large Datasets
    Dupuis, Debbie J.
    Victoria-Feser, Maria-Pia
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (493) : 203 - 212
  • [3] VIF Regression: A Fast Regression Algorithm For Large Data
    Lin, Dongyu
    Foster, Dean P.
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 848 - 853
  • [4] VIF Regression: A Fast Regression Algorithm for Large Data
    Lin, Dongyu
    Foster, Dean P.
    Ungar, Lyle H.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (493) : 232 - 247
  • [5] Bayesian Variable Selection in Linear Regression in One Pass for Large Datasets
    Ordonez, Carlos
    Garcia-Alvarado, Carlos
    Baladandayuthapani, Veerabhadaran
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2014, 9 (01)
  • [6] Robust distributed estimation and variable selection for massive datasets via rank regression
    Jiaming Luan
    Hongwei Wang
    Kangning Wang
    Benle Zhang
    [J]. Annals of the Institute of Statistical Mathematics, 2022, 74 : 435 - 450
  • [7] Robust distributed estimation and variable selection for massive datasets via rank regression
    Luan, Jiaming
    Wang, Hongwei
    Wang, Kangning
    Zhang, Benle
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2022, 74 (03) : 435 - 450
  • [8] Fast robust variable selection
    Van Aelst, Stefan
    Khan, Jafar A.
    Zamar, Ruben H.
    [J]. COMPSTAT 2008: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2008, : 359 - +
  • [9] Comparison of fast regression algorithms in large datasets
    Cangur, Sengul
    Ankarali, Handan
    [J]. KUWAIT JOURNAL OF SCIENCE, 2023, 50 (02)
  • [10] ROBUST CRITERION FOR VARIABLE SELECTION IN LINEAR REGRESSION
    Patil, A. B.
    Kashid, D. N.
    [J]. INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2009, 5 (02): : 509 - 521