Sample size determination for multidimensional parameters and the A-optimal subsampling in a big data linear regression model

被引:0
|
作者
Zhang, Sheng [1 ]
Tan, Fei [1 ]
Peng, Hanxiang [1 ]
机构
[1] Indiana Univ Indianapolis, Dept Math Sci, 402 N Blackford St LD 270, Indianapolis, IN 46202 USA
关键词
Asymptotic normality; A-optimalilty; big data; least squares estimate; sample size determination; APPROXIMATION;
D O I
10.1080/00949655.2024.2434669
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
To efficiently approximate the least squares estimator (LSE) in a Big Data linear regression model using a subsampling approach, optimal sampling distributions were derived by minimizing the trace norm of the covariance matrix of a smooth function of the subsampling LSE. An algorithm was developed that significantly reduces the computation time for the subsampling LSE compared to the full-sample LSE. Additionally, the subsampling LSE was shown to be asymptotically normal almost surely for an arbitrary sampling distribution under suitable conditions. Motivated by the need for subsampling in Big Data analysis and data splitting in machine learning, we investigated sample size determination (SSD) for multidimensional parameters and derived analytical formulas for calculating sample sizes. Through extensive simulations and real-world data applications, we assessed the numerical properties of both the subsampling approach and SSD methodology. Our findings revealed that the A-optimal subsampling method significantly outperformed uniform and leverage-score subsampling techniques. Furthermore, the algorithm considerably reduced the computational time required for implementing the full sample LSE. Additionally, the SSD provided a theoretical basis for selecting sample sizes.
引用
收藏
页码:628 / 653
页数:26
相关论文
共 50 条
  • [1] Optimal subsampling for quantile regression in big data
    Wang, Haiying
    Ma, Yanyuan
    BIOMETRIKA, 2021, 108 (01) : 99 - 112
  • [2] The A-optimal subsampling approach to the analysis of count data of massive size
    Tan, Fei
    Zhao, Xiaofeng
    Peng, Hanxiang
    JOURNAL OF NONPARAMETRIC STATISTICS, 2024,
  • [3] ORTHOGONAL SUBSAMPLING FOR BIG DATA LINEAR REGRESSION
    Wang, Lin
    Elmstedt, Jake
    Wong, Weng Kee
    Xu, Hongquan
    ANNALS OF APPLIED STATISTICS, 2021, 15 (03): : 1273 - 1290
  • [4] Optimal subsampling for composite quantile regression in big data
    Xiaohui Yuan
    Yong Li
    Xiaogang Dong
    Tianqing Liu
    Statistical Papers, 2022, 63 : 1649 - 1676
  • [5] Optimal subsampling for composite quantile regression in big data
    Yuan, Xiaohui
    Li, Yong
    Dong, Xiaogang
    Liu, Tianqing
    STATISTICAL PAPERS, 2022, 63 (05) : 1649 - 1676
  • [6] Optimal Subsampling for Functional Quasi-Mode Regression with Big Data
    Wang, Tao
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,
  • [7] Optimal subsampling for large-sample quantile regression with massive data
    Shao, Li
    Song, Shanshan
    Zhou, Yong
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2023, 51 (02): : 420 - 443
  • [8] Optimal subsampling proportional subdistribution hazards regression with rare events in big data
    Li Erqian
    Tang Man-Lai
    Tian Maozai
    Yu Keming
    STATISTICS AND ITS INTERFACE, 2025, 18 (03) : 361 - 377
  • [9] Optimal subsampling for composite quantile regression model in massive data
    Shao, Yujing
    Wang, Lei
    STATISTICAL PAPERS, 2022, 63 (04) : 1139 - 1161
  • [10] Optimal subsampling for composite quantile regression model in massive data
    Yujing Shao
    Lei Wang
    Statistical Papers, 2022, 63 : 1139 - 1161