Comparison of fast regression algorithms in large datasets

被引:0
|
作者
Cangur, Sengul [1 ]
Ankarali, Handan [2 ]
机构
[1] Duzce Univ, Dept Biostat & Med Informat, Duzce, Turkiye
[2] Istanbul Medeniyet Univ, Dept Biostat & Med Informat, Istanbul, Turkiye
关键词
Dimensional reduction; large data; robust; variance inflation factor; VARIABLE SELECTION; VIF REGRESSION;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The aim is to compare the performances of fast regression methods, namely dimensional reduction of correlation matrix (DRCM), nonparametric dimensional reduction of correlation matrix (N-DRCM), variance inflation factor (VIF) regression, and robust VIF (R-VIF) regression in the presence of mul-ticollinearity and outliers problems. In all simulation-scenarios, all the target variables were chosen for final models using four methods. The DRCM and N-DRCM are the methods that reach the final model in the shortest time, respectively. The time to reach the final model using R-VIF regression was approxi-mately twice shorter than that of VIF regression. In each method, as the number of variables and the level of outliers increased, the time taken to reach the final model increased. When the level of multicollinear-ity and the number of variables (p > 500) increased, the times to reach the final models using DRCM in datasets with outliers were slightly shorter than the those of N-DRCM. The largest numbers of noise variables were selected to the model using DRCM and N-DRCM, but the least number of them were selected to the model using the R-VIF regression. The RMSE values obtained using DRCM, N-DRCM and VIF regression were similar in each scenario. As a result of the real dataset, the final model selected using R-VIF regression had the highest R-2. It also had the lowest RMSE value among those obtained with other approaches excluding VIF regression. As such, the R-VIF regression method demonstrated a better performance than the others in all datasets.
引用
收藏
页数:1
相关论文
共 50 条
  • [1] Critical comparison of colocalization algorithms on large datasets
    Rallo, V.
    Angius, A.
    Steri, M.
    Sidore, C.
    Cucca, F.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 647 - 647
  • [2] Fast robust variable selection using VIF regression in large datasets
    Seo, Han Son
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2018, 31 (04) : 463 - 473
  • [3] Multitask Coupled Logistic Regression and Its Fast Implementation for Large Multitask Datasets
    Gu, Xin
    Chung, Fu-Lai
    Ishibuchi, Hisao
    Wang, Shitong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (09) : 1953 - 1966
  • [4] Algorithms for fast large scale data mining using logistic regression
    Rouhani-Kalleh, Omid
    [J]. 2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 155 - 162
  • [5] Computational AstroStatistics: Fast algorithms and efficient statistics for density estimation in large astronomical datasets
    Nichol, RC
    Connolly, AJ
    Moore, AW
    Schneider, J
    Genovese, C
    Wasserman, L
    [J]. VIRTUAL OBSERVATORIES OF THE FUTURE, PROCEEDINGS, 2001, 225 : 265 - 271
  • [6] Fast Algorithms for Segmented Regression
    Acharya, Jayadev
    Diakonikolas, Ilias
    Li, Jerry
    Schmidt, Ludwig
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [7] Ensemble of clustering algorithms for large datasets
    Pestunov I.A.
    Berikov V.B.
    Kulikova E.A.
    Rylov S.A.
    [J]. Optoelectronics, Instrumentation and Data Processing, 2011, 47 (3) : 245 - 252
  • [8] Hierarchical clustering algorithms for large datasets
    Stekh, Yuri
    Kernytskyy, Andriy
    Lobur, Mykhaylo
    [J]. TCSET 2006: MODERN PROBLEMS OF RADIO ENGINEERING, TELECOMMUNICATIONS AND COMPUTER SCIENCE, PROCEEDINGS, 2006, : 388 - 390
  • [9] An experimental comparison of fast algorithms for drawing general large graphs
    Hachul, S
    Jünger, M
    [J]. GRAPH DRAWING, 2006, 3843 : 235 - 250
  • [10] Fast algorithms for large-scale genome alignment and comparison
    Delcher, AL
    Phillippy, A
    Carlton, J
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (11) : 2478 - 2483