The A-optimal subsampling approach to the analysis of count data of massive size
被引:1
|
作者:
Tan, Fei
论文数: 0引用数: 0
h-index: 0
机构:
Indiana Univ Indianapolis, Dept Math Sci, Indianapolis, IN USAIndiana Univ Indianapolis, Dept Math Sci, Indianapolis, IN USA
Tan, Fei
[1
]
Zhao, Xiaofeng
论文数: 0引用数: 0
h-index: 0
机构:
North China Univ Water Resources & Elect Power, Sch Math & Stat, Zhengzhou, Henan, Peoples R ChinaIndiana Univ Indianapolis, Dept Math Sci, Indianapolis, IN USA
Zhao, Xiaofeng
[2
]
Peng, Hanxiang
论文数: 0引用数: 0
h-index: 0
机构:
Indiana Univ Indianapolis, Dept Math Sci, Indianapolis, IN USAIndiana Univ Indianapolis, Dept Math Sci, Indianapolis, IN USA
Peng, Hanxiang
[1
]
机构:
[1] Indiana Univ Indianapolis, Dept Math Sci, Indianapolis, IN USA
[2] North China Univ Water Resources & Elect Power, Sch Math & Stat, Zhengzhou, Henan, Peoples R China
A-optimality;
big data;
generalised linear models;
negative binomial regression;
optimal subsampling;
Poisson regression;
hat matrix;
truncation;
D O I:
10.1080/10485252.2024.2383307
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
The uniform and the statistical leverage-scores-based (nonuniform) distributions are often used in the development of randomised algorithms and the analysis of data of massive size. Both distributions, however, are not effective in extraction of important information in data. In this article, we construct the A-optimal subsampling estimators of parameters in generalised linear models (GLM) to approximate the full-data estimators, and derive the A-optimal distributions based on the criterion of minimising the sum of the component variances of the subsampling estimators. As calculating the distributions has the same time complexity as the full-data estimator, we generalise the Scoring Algorithm introduced in Zhang, Tan, and Peng ((2023), 'Sample Size Determination forMultidimensional Parameters and A-Optimal Subsampling in a Big Data Linear Regression Model', To appear in the Journal of Statistical Computation and Simulation. Preprint. Available at https://math.indianapolis.iu.edu/hanxpeng/SSD_23_4.pdf) in a Big Data linear model to GLM using the iterative weighted least squares. The paper presents a comprehensive numerical evaluation of our approach using simulated and real data through the comparison of its performance with the uniform and the leverage-scores- subsamplings. The results exhibited that our approach substantially outperformed the uniform and the leverage-scores subsamplings and the Algorithm significantly reduced the computing time required for implementing the full-data estimator.
机构:
Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R ChinaRenmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R China
Song, Yan
Dai, Wenlin
论文数: 0引用数: 0
h-index: 0
机构:
Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R ChinaRenmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing, Peoples R China
机构:
Peking Univ, LMAM, Sch Math Sci, Beijing 100871, Peoples R China
Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R ChinaPeking Univ, LMAM, Sch Math Sci, Beijing 100871, Peoples R China
Ai, Mingyao
Yu, Jun
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Inst Technol, Sch Math & Stat, Beijing 100811, Peoples R ChinaPeking Univ, LMAM, Sch Math Sci, Beijing 100871, Peoples R China
Yu, Jun
Zhang, Huiming
论文数: 0引用数: 0
h-index: 0
机构:
Peking Univ, LMAM, Sch Math Sci, Beijing 100871, Peoples R China
Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R ChinaPeking Univ, LMAM, Sch Math Sci, Beijing 100871, Peoples R China
Zhang, Huiming
Wang, HaiYing
论文数: 0引用数: 0
h-index: 0
机构:
Univ Connecticut, Dept Stat, Storrs, CT 06269 USAPeking Univ, LMAM, Sch Math Sci, Beijing 100871, Peoples R China