The A-optimal subsampling approach to the analysis of count data of massive size

被引:1
|
作者
Tan, Fei [1 ]
Zhao, Xiaofeng [2 ]
Peng, Hanxiang [1 ]
机构
[1] Indiana Univ Indianapolis, Dept Math Sci, Indianapolis, IN USA
[2] North China Univ Water Resources & Elect Power, Sch Math & Stat, Zhengzhou, Henan, Peoples R China
关键词
A-optimality; big data; generalised linear models; negative binomial regression; optimal subsampling; Poisson regression; hat matrix; truncation;
D O I
10.1080/10485252.2024.2383307
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The uniform and the statistical leverage-scores-based (nonuniform) distributions are often used in the development of randomised algorithms and the analysis of data of massive size. Both distributions, however, are not effective in extraction of important information in data. In this article, we construct the A-optimal subsampling estimators of parameters in generalised linear models (GLM) to approximate the full-data estimators, and derive the A-optimal distributions based on the criterion of minimising the sum of the component variances of the subsampling estimators. As calculating the distributions has the same time complexity as the full-data estimator, we generalise the Scoring Algorithm introduced in Zhang, Tan, and Peng ((2023), 'Sample Size Determination forMultidimensional Parameters and A-Optimal Subsampling in a Big Data Linear Regression Model', To appear in the Journal of Statistical Computation and Simulation. Preprint. Available at https://math.indianapolis.iu.edu/hanxpeng/SSD_23_4.pdf) in a Big Data linear model to GLM using the iterative weighted least squares. The paper presents a comprehensive numerical evaluation of our approach using simulated and real data through the comparison of its performance with the uniform and the leverage-scores- subsamplings. The results exhibited that our approach substantially outperformed the uniform and the leverage-scores subsamplings and the Algorithm significantly reduced the computing time required for implementing the full-data estimator.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] Becker's models for mixture experiments: An A-optimal approach
    Husain, Bushra
    Aslam, Fariha
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2025,
  • [32] Robust and efficient subsampling algorithms for massive data logistic regression
    Jin, Jun
    Liu, Shuangzhe
    Ma, Tiefeng
    JOURNAL OF APPLIED STATISTICS, 2024, 51 (08) : 1427 - 1445
  • [33] Model-free global likelihood subsampling for massive data
    Si-Yu Yi
    Yong-Dao Zhou
    Statistics and Computing, 2023, 33
  • [34] Estimation and testing of expectile regression with efficient subsampling for massive data
    Chen, Baolin
    Song, Shanshan
    Zhou, Yong
    STATISTICAL PAPERS, 2024, 65 (09) : 5593 - 5613
  • [35] Model-free global likelihood subsampling for massive data
    Yi, Si-Yu
    Zhou, Yong-Dao
    STATISTICS AND COMPUTING, 2023, 33 (01)
  • [36] Efficient Model-Free Subsampling Method for Massive Data
    Zhou, Zheng
    Yang, Zebin
    Zhang, Aijun
    Zhou, Yongdao
    TECHNOMETRICS, 2024, 66 (02) : 240 - 252
  • [37] Partitioning and subsampling to uncover subtle structure in massive data sets
    Rocke, DM
    MINING AND MODELING MASSIVE DATA SETS IN SCIENCE, ENGINEERING, AND BUSINESS WITH A SUBTHEME IN ENVIRONMENTAL STATISTICS, 1997, 29 (01): : 169 - 169
  • [38] Optimal subsampling for composite quantile regression in big data
    Xiaohui Yuan
    Yong Li
    Xiaogang Dong
    Tianqing Liu
    Statistical Papers, 2022, 63 : 1649 - 1676
  • [39] Optimal subsampling for composite quantile regression in big data
    Yuan, Xiaohui
    Li, Yong
    Dong, Xiaogang
    Liu, Tianqing
    STATISTICAL PAPERS, 2022, 63 (05) : 1649 - 1676
  • [40] Finite Sample Properties of A-Optimal Designs for Binary Response Data
    Nandy, Rajesh Ranjan
    Jasti, Srichand
    Nandy, Karabi
    STATISTICS AND APPLICATIONS, 2020, 18 (02): : 383 - 391