On the selection of optimal subdata for big data regression based on leverage scores

被引:0
|
作者
Chasiotis, Vasilis [1 ]
Karlis, Dimitris [1 ]
机构
[1] Athens Univ Econ & Business, Dept Stat, Athens, Greece
关键词
D-optimal designs; Design of experiments; Subdata; Linear regression; Information matrix;
D O I
10.1007/s42519-024-00420-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper, we explore an existing approach based on leverage scores, proposed for subdata selection in linear model discrimination. Our objective is to propose the aforementioned approach for selecting the most informative data points to estimate unknown parameters in both the first-order linear model and a model with interactions. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Rule Based Regression and Feature Selection for Biological Data
    Liu, Sheng
    Dissanayake, Shamitha
    Patel, Sanjay
    Dang, Xin
    Mlsna, Todd
    Chen, Yixin
    Wilkins, Dawn
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [42] High leverage points and vertical outliers resistant model selection in regression
    Shende, Kundalik S.
    Kashid, Dattatraya N.
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2021, 50 (06): : 1773 - 1792
  • [43] THE BOOTSTRAP-BASED SELECTION CRITERIA: AN OPTIMAL CHOICE FOR MODEL SELECTION IN LINEAR REGRESSION
    Shang, Junfeng
    ADVANCES AND APPLICATIONS IN STATISTICS, 2010, 14 (02) : 173 - 189
  • [44] Autonomous Vehicles Safe-Optimal Trajectory Selection Based on Big Data Analysis and Predefined User Preferences
    Al Najada, Hamzah
    Mahgoub, Imad
    2016 IEEE 7TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS MOBILE COMMUNICATION CONFERENCE (UEMCON), 2016,
  • [45] Research on Multi-Dimensional Optimal Location Selection of Maintenance Station Based on Big Data of Vehicle Trajectory
    Zhang, Shoujing
    Tong, Fujiao
    Li, Mengdan
    Jin, Shoufeng
    Li, Zhixiong
    ENTROPY, 2021, 23 (05)
  • [46] NIH Recruits Centers to Lead Effort to Leverage "Big Data"
    Kuehn, Bridget M.
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2013, 310 (08): : 787 - 787
  • [47] A compensatory approach to optimal selection with mastery scores
    vanderLinden, WJ
    Vos, HJ
    PSYCHOMETRIKA, 1996, 61 (01) : 155 - 172
  • [48] The optimal combination model building and application of linear regression based on prediction of sports scores
    Ye, Wei
    Ye, W. (yziy@sina.com), 1600, CESER Publications, Post Box No. 113, Roorkee, 247667, India (47): : 478 - 485
  • [49] Nested Regression Based Optimal Selection (NRBOS) of Rational Polynomial Coefficients
    Long Tengfei
    Jiao Weili
    He Guojin
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2014, 80 (03): : 261 - 269
  • [50] Optimal Feature Selection for Pedestrian Detection based on Logistic Regression Analysis
    Kim, Jonghee
    Lee, Jonghwan
    Lee, Chungsu
    Park, Eunsoo
    Kim, Junmin
    Kim, Hakil
    Lee, Jaeeun
    Jeong, Hoeri
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 239 - 242