On the selection of optimal subdata for big data regression based on leverage scores

被引:0
|
作者
Chasiotis, Vasilis [1 ]
Karlis, Dimitris [1 ]
机构
[1] Athens Univ Econ & Business, Dept Stat, Athens, Greece
关键词
D-optimal designs; Design of experiments; Subdata; Linear regression; Information matrix;
D O I
10.1007/s42519-024-00420-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper, we explore an existing approach based on leverage scores, proposed for subdata selection in linear model discrimination. Our objective is to propose the aforementioned approach for selecting the most informative data points to estimate unknown parameters in both the first-order linear model and a model with interactions. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Privacy preserving based logistic regression on big data
    Fan, Yongkai
    Bai, Jianrong
    Lei, Xia
    Zhang, Yuqing
    Zhang, Bin
    Li, Kuan-Ching
    Tan, Gang
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2020, 171
  • [32] How organisations leverage Big Data: a maturity model
    Comuzzi, Marco
    Patel, Anit
    INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2016, 116 (08) : 1468 - 1492
  • [33] Optimal subsampling proportional subdistribution hazards regression with rare events in big data
    Li Erqian
    Tang Man-Lai
    Tian Maozai
    Yu Keming
    STATISTICS AND ITS INTERFACE, 2025, 18 (03) : 361 - 377
  • [34] Bayesian scale mixtures of normals linear regression and Bayesian quantile regression with big data and variable selection
    Chu, Yuanqi
    Yin, Zhouping
    Yu, Keming
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2023, 428
  • [35] AN OPTIMAL SELECTION OF REGRESSION VARIABLES
    SHIBATA, R
    BIOMETRIKA, 1981, 68 (01) : 45 - 54
  • [36] Leveraging for big data regression
    Ma, Ping
    Sun, Xiaoxiao
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2015, 7 (01): : 70 - 76
  • [37] Selection of Audio Learning Resources Based on Big Data
    Wang, Peng
    Wang, Xia
    Liu, Xia
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2022, 17 (06) : 23 - 38
  • [38] Optimal Quantization for Big Data Based on the Dynamic Programming
    Li, Punan
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [39] Quantile regression in big data: A divide and conquer based strategy
    Chen, Lanjue
    Zhou, Yong
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 144
  • [40] The Ship Collision Accidents Based on Logistic Regression and Big Data
    Wang, Yi-han
    Ou, Yang
    Deng, Xu-dong
    Zhao, Lu-ran
    Zhang, Chao-yu
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 4438 - 4440